An Introduction to 





Twelve Computational Projects 
Solved with MATLAB 


Ionut Danaila 

Pascal Joly 

Sidi Mahmoud Kaber 
Marie Postel 


An Introduction to Scientific Computing 


Ionut Danaila 

Pascal Joly 

Sidi Mahmoud Kaber 
Marie Postel 


An Introduction to Scientific Computing 


Twelve Computational Projects Solved 
with MATLAB 


Z) Springer 


Ionut Danaila 

Laboratoire Jacques-Louis Lions 
Université Pierre et Marie Curie 
Paris 75252 

FRANCE 
danaila@ann.jussieu.fr 


Sidi Mahmoud Kaber 
Laboratoire Jacques-Louis Lions 
Université Pierre et Marie Curie 
Paris 75252 


Pascal Joly 

Laboratoire Jacques-Louis Lions 
Université Pierre et Marie Curie 
Paris 75252 

FRANCE 

Joly@ann.jussieu.fr 


Marie Postel 

Laboratoire Jacques-Louis Lions 
Université Pierre et Marie Curie 
Paris 75252 


FRANCE FRANCE 
kaber@ann.jussieu.fr postel@ann.jussieu.fr 


Library of Congress Control Number: 2006931780 


ISBN-10: 0-387-30889-X 
ISBN-13: 978-0-387-30889-0 


Printed on acid-free paper. 


© 2007 Springer Science + Business Media, LLC 

MATLAB® is a trademark of The Math Works, Inc. and is used with permission. The Math 
Works does not warrant the accuracy of the text or exercises in this book. This book’s use or 
discussion of MATLAB® software or related products does not constitute endorsement or 
sponsorship by The Math Works of a particular pedagogical approach or particular use of the 
MATLAB® software. 

All rights reserved. This work may not be translated or copied in whole or in part without the 
written permission of the publisher (Springer Science + Business Media, LLC, 233 Spring Street, 
New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly 
analysis. Use in connection with any form of information storage and retrieval, electronic 
adaptation, computer software, or by similar or dissimilar methodology now known or hereafter 
developed is forbidden. 

The use in this publication of trade names, trademarks, service marks, and similar terms, even if 
they are not identified as such, is not to be taken as an expression of opinion as to whether or 
not they are subject to proprietary rights. 


987654321 


springer.com 


Ionut Danaila 

Pascal Joly 

Sidi Mahmoud Kaber 
Marie Postel 


An Introduction to Scientific 
Computing: Twelve 
Computational Projects Solved 


with MATLAB 


SPIN Springer’s internal project number, if known 


— Monograph — 


October 13, 2006 


Springer 
Berlin Heidelberg New York 


Hong Kong London 
Milan Paris Tokyo 


To Alice, Luminita 
Romain, Sylvain 
Sarah, Thomas 
Camille, Paul 


Preface 


Teaching or learning numerical methods in applied mathematics cannot be 
conceived nowadays without numerical experimentation on computers. There 
is a vast literature devoted either to theoretical numerical methods or nu- 
merical programming of basic algorithms, but there are few texts offering a 
complete discussion of numerical issues involved in the solution of concrete 
and relatively complex problems. This book is an attempt to fill this need. 
It is our belief that advantages and drawbacks of a numerical method cannot 
be accounted for without one’s experiencing all the steps of scientific comput- 
ing, from physical and mathematical description of the problem to numerical 
formulation and programming and, finally, to critical discussion of numerical 
results. 

The book provides twelve computational projects aimed at numerically 
solving problems selected to cover a broad spectrum of applications, from 
fluid mechanics, chemistry, elasticity, thermal science, computer-aided design, 
signal and image processing, etc. Even though the main volume of this text 
concerns the numerical analysis of computational methods and their imple- 
mentation, we have tried to start, when possible, from realistic problems of 
practical interest for researchers and engineers. 

For each project, an introductory record card summarizes the mathemat- 
ical and numerical topics explained and the fields of application of the ap- 
proach. A level of difficulty, scaling from 1 to 3, is assigned to each project. 
Most of the projects are of level 1 or 2 and can be easily tackled; the reader 
will no doubt realize that projects of level 3 require a solid background in 
both numerical analysis and computational techniques. 

Excepting projects 1 and 3, which are more theoretical, all projects follow 
the typical steps of scientific computing: physical and mathematical modeling 
of the problem, numerical discretization, construction of a numerical algo- 
rithm, and, finally, programming. We have placed considerable emphasis on 
practical issues of computational methods that are not usually available in 
basic textbooks. Numerical checking of accuracy or stability, the choice of 
boundary conditions, the effective solving of linear systems, and comparison 
to exact solutions when available are only a few examples of problems en- 
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countered in the application of numerical methods. The last section of each 
project contains solutions of all proposed exercises and guides the reader in 
using the MATLAB scripts that can be accessed via the publisher’s web site 
www.springer.com. Programming techniques such as vectorial programming 
and memory storage optimization are also addressed. We finally discuss the 
physical meaning of the obtained results. The complementary references given 
at the end of each chapter form a guide for further, more specialized, reading. 

The text offers two levels of interest. The mathematical framework pro- 
vides a basic grounding in the subject of numerical analysis of partial differ- 
ential equations and main discretization techniques (finite differences, finite 
elements, spectral methods, wavelets). Meanwhile, we hope that the informa- 
tion contained herein and the wide range of topics covered by the book will 
allow the reader to select the appropriate numerical method to solve his or 
her particular problem. 

The book is based on material offered by the authors in courses at Univer- 
sité Pierre et Marie Curie (Paris, France) and different engineering schools. 
It is primarily intended as a graduate-level text in applied mathematics, but 
it may also be used by students in engineering or physical sciences. It will also 
be a useful reference for researchers and practicing engineers. Since different 
possible developments of the projects are suggested, the text can be used to 
propose assignments at different graduate levels. 

Despite our efforts to avoid typing, spelling, or other errors, the reader 
will no doubt find some remaining. We shall appreciate all feedback notifying 
us of any mistakes, as well as comments and suggestions that will help us to 
improve the text. Please use the e-mail addresses given below for this purpose. 

We conclude by saying a few words about the programs provided with 
this book. They are written in MATLAB, a widely used software environment 
for scientific computing produced by The MathWorks Inc. We consider that 
an interpreted language (such as MATLAB, SCILAB, OCTAVE) is the ideal 
framework to start a scientific programming activity. Debugging is very simple 
and the wide variety of available numerical tools (for solving linear systems, 
integrating ordinary differential equations, etc.) allows one to concentrate on 
the main features of the resolution algorithm. The highly versatile graphical 
interface is also very important to easy visualization of the obtained results. 

Our programs are written with a general concern for simplicity and effi- 
ciency on ordinary personal computers; program lines are commented in what 
we hope is sufficient detail for the reader to follow mathematical develop- 
ments. Programming tricks are discussed in the text when they seem to be of 
general interest. Projects 11 and 12 are also provided with more elaborate ver- 
sions of the programs, using interactive graphical user interfaces. The reader 
should try to modify these programs to test different suggested run cases or 
extensions of the projects. We believe that experience with these simple pro- 
grams will be valuable in writing numerical codes using compiled languages 
(such as Fortran, C, or C++) to solve real industrial problems on mainframe 
computers. 
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Numerical Approximation of Model Partial 
Differential Equations 


Project Summary 


Level of diffculty: 1 


Keywords: Linear differential equations; numerical integration 
methods; finite difference schemes: Euler schemes, 
Runge-Kutta schemes 


Application fields: Transport phenomena, diffusion, wave propagation 


This first chapter is intended as a quick introduction to basic discretization 
techniques of time-dependent partial differential equations (PDEs). We con- 
sider it important that the reader, before tackling the complex problems of 
the next chapters, have some understanding of the mathematical and phys- 
ical properties of the following model PDEs: the convection equation, the 
wave equation, and the heat equation. This chapter is therefore organized as 
a collection of several short exercises in which model PDEs are theoretically 
analyzed and numerically solved using the simplest discretization methods. 
The essential features of numerical methods are presented, with emphasis on 
fundamental ideas of accuracy, stability, convergence, and numerical dissipa- 
tion. Particular care is devoted to the validation of numerical procedures by 
comparing to exact solutions available for these simple cases. 








1.1 Discrete Integration Methods for Ordinary 
Differential Equations 


We generally define a partial differential equation (PDE) as a relation between 
a function of several variables and its partial derivatives. In this section, we 
consider the simplest case of ordinary differential equations (ODE), which 
depend on a single independent variable (time variable here) and present 
discrete methods for their numerical integration. These methods (or numerical 
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schemes) will prove useful in the following sections when we discuss PDEs 
depending both on time and space variables. 

Let us consider the following problem: find a differentiable function u : 
[0, T] => R” that is a solution of the ODE 


u (t) = f(t, u(t)), (1.1) 


where T is a nonnegative scalar and f : [0,71 x R™ + R” a continuous 
function. This problem is not completely specified by its equation: for its 
integration we need to know the initial value (at t = 0) of the unknown 
function. 


Definition 1.1. A Cauchy (or initial value) problem is the coupling of the 
ODE (1.1) with an initial condition 


u(0) = uo, (12 
where ug is a given vector in R™. 


Theoretical results on existence and uniqueness of the solution of the prob- 
lem (1.1)-(1.2) go back to Cauchy in 1824. The reader interested in a more 
mathematical approach to the problem will want to refer to many existing 
books on ODEs (see, for instance, the references at the end of this chapter). 
We adopt here a more practical point of view, and we start directly by pre- 
senting simple numerical methods to compute approximations of the solution 
in the scalar case, or one-dimensional case, m = 1. 

Since the computer can deal only with a finite number of discrete values, 
the numerical algorithm to solve the Cauchy problem (1.1)-(1.2) starts by 
setting the points to,t,,...,¢ at which the solution will be computed. The 
points tn,n = 0,...,N, define a discretization (or a grid) of the interval 
I = [0,7]. The equidistant or regular distribution of the grid points is the 
simplest and will be used in this chapter. We set (see Fig. 1.1) tn = nh, with 
h = T/N the constant discretization step (or time step if t is regarded as a 
time variable) and define the subintervals 1, = |ty,tn4i],n = 0,...,N — 1 
(notice that to = 0 and ty = T). 

The numerical approximation of the Cauchy problem consists in building 








a sequence of numbers (depending on N) uen) sedis ugo that approximate 
the values u(to),...,u(tn) of the exact solution u(t) at the same computa- 


tion points. We always start with ub) = uo in order to satisfy the initial 
value condition u(to) = uo. In order to simplify notation, we will refer, when 


possible, to ul) by un. 


1.1.1 Construction of Numerical Integration Schemes 


Having discretized the definition interval J, we must find a formula to compute 
values u,,, for n = 1,...,.N. Such a formula, which is usually called a numerical 
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Fig. 1.1. Regular grid and numerical approximation of an ODE. 


scheme, is obtained by discretizing the differential operator in the ODE. We 
present here two types of methods that can be used to build numerical schemes 
for the ODE (1.1). Remember that any integration scheme will start from the 
value uo imposed by the initial condition. 


Methods Based on Finite Differences 


This type of method consists in writing the equation (1.1) at time t = tn and 
replacing u’(t,,) by a finite difference approximation. For this purpose we use 
a Taylor series expansion to approximate the values of the unknown « for t 
close to tn. We consider, for instance, the example of the first derivative. 


Definition 1.2. The discretization step h being fixed, we define the following 
finite difference operators: 


e forward or progressive 


D'u(t) = — ne (1.3) 
e backward or regressive 
D-u(t) = =, (1.4) 


e central 


2h 
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Let us assume that the function u is twice continuously differentiable. Then 
there exists 07 € [0, h] such that 


U(insi) = Ut) + hu (tn) + Eure, + 07). (1.6) 


We can derive from this expansion an approximation of u’(t,): 


ul (ty) = nt) On) Bat, +O) & Dtulta) (17) 


and calculate the approximation (or truncation) error 


h 
En = |u (tr) =D ulta) < F max |u” (t)|. (1.8) 
eh 
Assuming that |u”| is bounded, we infer that the truncation error decays to 0 
with h. We conventionally denote the approximation error by O(h) and write: 


i (tn) =D a ie + O(h). 





Definition 1.3. We say that DT u(t,) is a first-order approximation of u' (tn). 
We generally define the order of accuracy of the difference approximation as 
the power of h with which the approximation error tends to zero. 


It is easy to see that D-u(t,) = [u(tn) — u(tn—1)|/h is also a first-order 
approximation of u'(t,), while D°u(t,) = [u(tn41)—Uu(tn—1)|/(2h) is a second- 
order (i.e., the order of accuracy is two) approximation of u’(t,). 

More generally, it is possible to use linear combinations of several finite 
difference operators to find approximations of u’(t,). For instance, we can 
approach 

u' (tn) & aD ultn) + BD°ul(tn) + yD ultn), (1.9) 


with parameters a, B, and y chosen such that the approximation has the 
highest possible order of accuracy. 

Taylor series expansion remains the basic tool for building approximations 
of higher-order derivatives. For the second derivative, for instance, the simplest 
recipe is the following: continue the expansion (1.6) to the fourth-order, write 
a similar expansion for u(t,-_1), and sum the two relationships. A centered 
second-order approximation of the second derivative is thus obtained: 


U(tn41) — 2u(ty) + u(tn—1) 


LAND D ulta = 73 


(1.10) 


We address now the problem of building numerical schemes for the ODE 
(1.1) using the previous finite difference approximations. Considering the ODE 
at time tn and replacing u’(t,) by Dt u(tn), we obtain the scheme 


Un+1 = Un hf tas Un). (1.11) 
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We recall that u, +1, respectively un, are numerical approximations of u(tn+1), 
respectively u(t, ). 

The scheme (1.11) is called the explicit Euler scheme, or simply the Euler 
scheme. This method is said to be explicit because un+ı depends explicitly 
on tn and the old value u,,. More generally, a numerical scheme is explicit if 
Un+1 can be calculated explicitly from quantities that are already known (i.e., 
values of the solution at previous times). 

Consider now the ODE (1.1) at time tn+1ı and replace u’(t,41) by D7 u(tn+1); 
we obtain the implicit Euler scheme 


Un+1 = Un F hf (Traa Un+1)- (1.12) 


This time, un+1 is computed as the solution (if it exists!) of an implicit equa- 
tion. This requires more work, in particular when the function ur f(t, u) is 
nonlinear with respect to u. 
The approximation u'(tn) & D'u(t,) in (1.1), written at time tn, leads to 
the scheme 
Unai = Un—1 + 2hf (tn, Un), (1.13) 


called the leapfrog (or midpoint) scheme. 


Methods Based on Quadrature Formulas 


Another way to build a numerical scheme is based on quadrature formulas 
(numerical integration is also called quadrature). Integrating the ODE (1.1) 
on the interval I„ = ftn, tn+1], we obtain 


ii = | D eE (1.14) 


n 


We can hence compute u(tn+1) starting from the old value u(tn) if we are able 
to approximate the integral Z,. We go back to a quadrature problem. 
Several quadrature rules can be used to estimate the integral in (1.14): 


e the left endpoint rule 
L Pores Casta). (1.15) 


leading to the explicit Euler scheme (1.11); 
e the right endpoint rule 


Ly, ~ hf (tn+1, Un+1), (1.16) 


defining the implicit Euler scheme (1.12); 
e the midpoint (or rectangle) rule 


Tn = hf (tn +h/2, ulta +h/2)), ie 


leading, using the approximation 
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u(n +h/2) = ulin) + ZU (tn) = ulin) + ZS (tn ultn)), (1.18) 


to the modified explicit Euler scheme: 


h h 
e the trapezoid rule 
h 
Th Fe 9 [f (tn, Un) ae f(tn4i, Un+1 )] ) (1.20) 


yielding the semi-implicit Crank—Nicolson scheme. 


1.1.2 General Form of Numerical Schemes 
The general form of a numerical scheme for the ODE (1.1) is 
Un+1 — i batik tetas acc) (1:21) 


If F depends on q previous values un—;, 7 = 0,...,q—1, the scheme is said to 
be a q-step scheme. For instance, the leapfrog scheme is a two-step scheme. If 
F does not depend on the solution at time level tn+1, the scheme is said to 
be explicit. Otherwise, the scheme is implicit. 


Remark 1.1. To start a one-step scheme, a single value is needed; this is uo, 
which is always set by the initial condition u(0). It goes differently in the case 
of a (q > 1)-step scheme; this scheme can be used to compute values u,, for 
n > q, once the first q values uo,...,Ug—1 are known. Since only the initial 
condition uo is provided, the missing intermediate values can be computed 
using lower-step schemes. For example, a one-step scheme can be used to 
compute u1, a two-step scheme to compute ug, and a (q — 1)-step scheme to 
compute Ug-1. 


Definition 1.4. For the numerical scheme (1.21) we define the formal local 
truncation error as 


En = U(tn+1) — F (h; PREE uļtn+1); Vis u(tn); T 3 ; (1.22) 


where u(t) is the solution of the ODE (1.1). The scheme has order of accuracy 
p if €n = O(hPt") ash — 0. The scheme is said consistent if it has order of 
accuracy p > I. 


The idea behind this definition is that the discretized equation should tend 
to the exact ODE when h — 0. In other words, when applying the numerical 
scheme to the exact solution function u(t) one should recover (by Taylor series 
expansions) the original ODE plus a reminder representing the truncation 
error. Let us illustrate this by the example of the leapfrog scheme (1.13), for 
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Explicit Euler Un+1 = Un thf (tn, Un) 
(first order) 


Implicit Euler Un+1 = Un Ph) (tn+1, Un+1) 
(first order) 


leapfrog Sni F hS n in) 
(second order) 


Modified Euler = un + hf (tn + $, Un + Ef (tn, Un)) 
(second order) 


Crank—Nicolson ay 2 Uf (tn, Un) + f (tn+1, Un+1)| 
(second order) 


Adams-Bashforth = Un + hl et (Gs Un) — at, ua) 
(second order) 


Adams-Bashforth = Un + RS f(tn, Un) — Bf (tr—1, Un—-1) + Sf (tn—2, Un-2)] 
(third order) 


Adams—Moulton = Un + h[3f(tr4i,unsti) + Sf (tr, un) — SF (tr-1, Un-1)| 
(third order) 


= ita): 
=hf(tn + h, Uun + kı), 
= Un + > (ki + ko) 


Runge-Kutta (Heun) 
(second order) 


Runge-Kutta 
(fourth order) 





Table 1.1. Numerical schemes for the ODE u’(t) = f(t, u). 


which F = un—1 + 2hf (tn, un). Using Taylor series expansions about t = tn, 
we obtain for the truncation error (1.22) the expression 
h? 
En = 2h [u (tn) — f (tn, u(tn))| + yU (tn) He, (1.23) 
Since u'(tn) = f(tn, u(tn)), we conclude that the leap-frog scheme is a second- 
order scheme, i.e., the order of accuracy is two. Some numerical schemes com- 
monly used in practice are summarized in Table 1.1. 
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1.1.3 Application to the Absorption Equation 


The model equation that describes an absorption (or production) phenomenon 
is the following: find a function u : Rt — R that is a solution of the Cauchy 
problem 


ee vt > 0, (1.24) 


u(0) = uo, 
where a € R is a given physical constant and the source term f takes into 


account the production in time of the quantity u. 


Example 1.1. The intensity of the radiation emitted by a radioactive body is 
estimated by measuring the concentration u(t) of an unstable isotope. This 
concentration decays by a factor of two during a time interval T (called the 
half-life) according to the law 


In2 
u(t) = au(t), with a=. 


Exercise 1.1. Consider the Cauchy problem (1.24). 


1. We set u(t) = e-%u(t). Write the ordinary differential equation satisfied 
by v. Solve analytically this equation and verify that 


u(t) = e~% (uo i | “en fleas) | (1.25) 


0 





2. Derive the exact solution in the case of a depending on t. 

3. Assuming that a and f are constants, derive an expression for u and 
calculate limz_,4. u(t). 

4. Consider f = 0, and a more general coefficient a € C, with real part 
a, > 0. Show that limy.4. u(t) = 0. 

5. Write a MATLAB function to implement the explicit Euler scheme (1.11). 
The definition header of the function will be as follows 


function u=PDE_EulerExp(fun,u0,t0,t1,n) 
/% Input arguments: 


% fun the name of the right-hand-side EDO function 
h tO the initial time 

h uO the initial condition at tO 

h ti the final time 

h n the number of time steps between tO and ti 


À Output arguments: 


h u the dimension (n+1) vector containing the numerical 


h solution at times tO+i*h, with h=(t1-t0)/n 
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Hint: use the MATLAB built-in function feval to evaluate the parameter 
function fun within the PDE_EulerExp function. 
In a MATLAB program (or script), call the PDE EulerExp function to 
solve the ODE u(t) + 4u(t) = 0 with the initial condition u(0) = 1. 
Set to = 0,t1 = 3, and n = 24 (h = 1/8). Plot the results, both exact 
and numerical solutions superimposed on a single graph, and comment on 
them. Perform the same computation for n = 6 (h = 1/2). Comment on 
the results. 

6. Use instead of the explicit Euler scheme the fourth-order Runge-Kutta 
scheme (write a function PDE RKutta4 using as model PDE EulerExp). 
Comment on the results obtained for h = 1/2. 


A solution of this exercise is proposed in Sect. 1.3 at page 19. 


1.1.4 Stability of a Numerical Scheme 





We consider here the absorption equation for f = 0 and a € R™. The exact 
solution is then u(t) = e~% ugo with the property lim;.4.. u(t) = 0. Assume 
that we want to compute this solution using the explicit Euler scheme (1.11). 
We obtain a sequence of values un = (1 — ah)"uo. It is easy to see that 





e if h > 2/a, then 1 — ah < —1 and the sequence (un) diverges; 
e if0 <h< 2/a, then |1 — ah] < 1 and the sequence (un) decays to 0 as 
t — oo, reproducing the behavior of the exact solution. 





Let us assume at this point that the reader, pushed by curiosity, has already 
answered question 5 of Exercise 1.1. The above analysis explains the strange 
behavior of the numerical solution obtained for a discretization step h = 1/2 
and a = 4 (the solution takes alternatively the values +1 and —1, see Fig. 
1.3). The real question behind this observation is how to be sure that the 
numerical scheme gives the right solution. Part of the answer is related to 
the accuracy of the scheme: if the scheme is consistent (see Definition 1.4), 
we know that the discrete scheme commits local (at a given time) errors 
that vanish when h — 0. Unfortunately, as can be seen from our example, 
consistency is not sufficient to achieve convergence to the exact solution. The 
stability of the numerical scheme is also required for a successful numerical 
computation. Intuitively, we can say that a numerical scheme will be stable if 
it does not magnify the errors appearing during the computation. 

The fundamental concept of stability can be mathematically addressed in 
several ways (see, for example, Richtmyer and Morton, 1967; LeVeque, 1992; 
Hirsch, 1988; Trefethen, 1996). The widely used definition of the stability (also 
known as zero-stability, or Lax—Richtmyer stability for PDEs) requires that the 
computed values remain bounded when h — 0 for a fixed integration interval 
[0,7]. This is an important concept since, as stated from the well-known 
equivalence theorem (due to Dahlquist for ODEs and to Lax and Richtmyer 
for PDEs, see Trefethen (1996)), the zero-stability is a necessary and sufficient 
condition for a consistent scheme to be convergent. 
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In some practical applications, it is not always possible to take h small 
enough for the zero-stability to apply. This is the case of stiff ODEs, i.e., 
involving different varying time scales (see Chap. 2). For this type of ODEs, 
we generally use the concept of absolute stability which considers the behavior 
of the numerical scheme when the time step A is held fixed and t — oo. 

We illustrate in the following the concept of absolute stability by the ex- 
ample of the absorption equation (u’(t) = —au).' We consider the simplest 
case of one-step schemes in Table 1.1. We can recast these numerical schemes 
into the general form 


mai = G(-—ah)uy, =... = [G(—ah)]"** uo. (1.26) 


G is called the amplification function? and is supposed to reflect the behavior 
of the exact solution, since this satisfies the relationship 


iia ero at) (1.27) 


The reader is invited to derive the following expressions for the amplification 
function (we denote z = —ah): 


Explicit Euler: G= 

Implicit Euler: G(z) =1/(1- 2), 
Modified Euler: Gz) ="( 
Runge-Kutta (second order): G(z) 
Runge-Kutta (fourth order): G(z) = 1 + z + 27/24 2°/6 + 24/24. 


A sufficient condition for stability is now |G(—ah)| < 1. This stability condi- 
tion ensures that the numerical solution has the same behavior as the exact 
solution when t — oo, since 


lim [un|< lim [uol |G(—ah)|”" = 0. 
+00 +00 

Definition 1.5. The locus S of points z € C for which |G(z)| < 1 is called 

the (absolute) stability region of the scheme. 


For example, the stability region S of the explicit Euler scheme is the open 
disk of radius 1, centered at the point (—1,0). The scheme will hence be 
absolutely stable if the discretization step h is chosen such that |1 — ah| < 1. 


! The linear ODE u’(t) = au(t) for some constant a € C is generally used as 
model equation to investigate the absolute stability of a numerical scheme. For 
nonlinear systems of ODEs, a similar analysis can be applied after linearization 
and diagonalization (see Chap. 2) — this type of stability is often reffered to as 
the ezgenvalue stability. 

? For multistep schemes, G becomes a matrix; for the analysis of the absolute 
stability of multistep schemes, see, for instance, Trefethen (1996). 
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Remark 1.2. According to Definition 1.5, the absolute stability region S con- 
tains the points for which |u,,| — 0 as t — oo. It is interesting to note that in 
some textbooks (e.g. Trefethen (1996)) the absolute stability region is defined 
as the locus S of points z € C for which |G(z)| < 1, i.e., we ask that the 
numerical solution u, be bounded as t — oo. In general, if S is not empty, S 
is the closure of S. But there are some special cases, as the leapfrog scheme 
for which S is empty and S = [-i, i]. 

This second definition of the absolute stability is important since it makes 
the link with the zero-stability: a numerical scheme is zero-stable if and only 
if the origin z = 0 belongs to S (see Trefethen, 1996, for more details). 





Exercise 1.2. Plot in the same figure the bounds of the stability regions of 
the following schemes: explicit Euler, second-order Runge-Kutta, and fourth- 
order Runge-Kutta. Hint: define a complex variable z = (x,y) covering the 
rectangle |—4, 1] x [—4, 4] (use the MATLAB built-in function meshgrid) and 
plot the contour line corresponding to |G(z)| = 1 (function contour). 

A solution of this exercise is proposed in Sect. 1.3 at page 19. 


1.2 Model Partial Differential Equations 


The PDEs presented in this chapter model elementary physical phenomena: 
convection, wave propagation, and diffusion. For each of these problems, we 
present one or several model equations, derive an exact solution in partic- 
ular cases, and compute approximate solutions using appropriate numerical 
schemes. We consider in this section the following PDEs: 


e the convection equation: Œu(x,t) + cOu(x,t) = f(a, t), 
e the wave equation: 62 u(x,t) — 02 ulz, t) = 0, 
e the heat equation: Œu(xr,t) — kô u(x,t) = f (x,t). 


1.2.1 The Convection Equation 


The PDE describing the convection (or transport) of a quantity u(x,t) at 
velocity c (assumed constant in the following) is 


Oulx, t) + cO,u(z,t) = f(a,t), VeeR, Vt>oO, (1.28) 
with the initial condition 
u(x,0) = up(x), Vr ER. (1.29) 
The source term f(x,t) generally models the production in time of u. 


Example 1.2. The transport of a pollutant in the atmosphere is modeled by 
the PDE du + 0,(cu) = 0, where u(x,t) is the concentration of the pollutant 
and c the wind velocity. If c is assumed constant, we retrieve the form (1.28) 
of the PDE. Note that f = 0 corresponds to the case that there is no further 
production of pollutant at times t > 0. 
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Exercise 1.3. We consider the convection equation (1.28) with the initial 
condition (1.29) in the case f(x,t) = 0 (no sources). 


1. In order to compute the exact solution we introduce the change of variables 
X=ar+ft, T=yr+ut  (a,B,7,u ER), (1.30) 


and define the function U by U(X, T) = u(x,t). Is this change of variables 
a bijection? Write the PDE satisfied by U. What happens to this equation 
if 5 = —ca ? Solve this last equation analytically. Deduce that the solution 
is constant along the lines (C4) defined in the (x,t) plane by 


DÉC EER. (1.31) 


Definition 1.6. The lines (1.31) are called characteristic curves of the 
convection equation (1.28). 


2. We now want to find the exact solution of (1.28)-(1.29) on a finite real 
interval [a,b]. We assume that c > 0 and proceed geometrically. After 
drawing in the (x,t) plane the characteristic curves C for € € |a, b], show 
that the solution u(x,t) for all x € [a,b] and all t > 0 is completely deter- 
mined by the initial condition wo and an additional boundary condition 


ült = pt): Vt > 0. (1.32) 
Show that for a given T the exact solution is 


uo(x — cT) if x— cT >a, 


Usb) = 1.33 
a s(r-° *) if x— cT <a. (\ 
C 


Note that in the case w(a) Æ w(0), the solution is discontinuous along 
the characteristic curve x — cT = a. Find the value of T after which the 
initial condition up has completely left the interval [a, b], i.e., u(x,t) can 
be written as a function of y(t) only. Determine the boundary condition 
necessary to calculate u in the case of c < 0. 

3. For the numerical solution of the convection equation (1.28) we define a 
regular space discretization 








=a J0%; 00 = a Ue ere Sea (1.34) 


J 1 
and a time discretization (T > 0 is fixed) 


T 
ta = 0t b= 7 E E E (1.35) 
Write a MATLAB function to compute the exact solution (1.33) following 


the model: 
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function uex=PDE_conv_exact_sol(a,b,x,T,fun_ci,fun_cl) 
Input arguments: 


YA a,b the interval [a,b] 

h c>0 the convection speed 

vA x the vector x(j)=atj*xdelta x, j=0,1,...,J 

h T the time at which the solution is computed 
vA fun_in(x) the initial condition for t=0 

h fun bc(x) the boundary condition for x=a 

h Output argument: 

h uex the vector of length J+1 containing the 

h exact solution 


4. We assume that c > 0 and denote by wu? an approximation of ur). 
The following numerical scheme is proposed to compute u7: 
e Forn = 0 the initial condition is imposed: uy Une = 0h: 
e Forn—0,...,N — 1 (loop in time to compute u"*1): 
e For j =1,...,J (the interior of the domain) 
1 côt 
ie =u; — Fn (us thy y): (1.36) 
e Set boundary value: ire =O Cig) 
(a) Justify geometrically (draw the characteristic starting from point 
i) that the previous algorithm is well defined if 





côt 
„Aa a 1.37 
Re ( ) 
Definition 1.7. The inequality (1.37) gives a sufficient condition for 
the stability of the upwind scheme (1.36) for the convection equation 
and is called the CFL (Courant-Friedrichs-Lewy) condition. 


(b) Write a program using this algorithm to solve the convection equation 
for the following data: 


G=]0,. De Et, 750, 


uo(x) = x, y(t) = sin(107t). 


Choose J = 40 and compute ôt from (1.37) with o = 0.8. 

Plot the solutions obtained after n = 10, 20, 30, 40, 50 time steps. Com- 
pare to the exact solution. 

What happens if o = 1, or o = 1.1 ? Comment on the influence of the 
value of ø on the stability of the scheme. 


A solution of this exercise is proposed in Sect. 1.3 at page 21. 
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1.2.2 The Wave Equation 


Acoustic (or elastic, or seismic) wave propagation is modeled by the following 
second-order PDE: 


dulx,t)— cd ulz, t)=0, t>0, (1.38) 


where c is the wave propagation velocity. The corresponding Cauchy problem 
requires two initial conditions: 


(0,0) = uo(x), “du 0) = u(x). (1.39) 


Example 1.3. The oscillations of an elastic string are described by the equation 
(1.38), where the function u(x,t) represents the displacement of the string in 
the vertical plane. The propagation speed depends on the tension 7 in the 
string and on its linear density p according to the law c = ,/7T/p. Relations 
(1.39) provide the initial position and velocity of the string. 





If the string is considered infinite, the equation is defined on the whole set 
R. For a string of finite length 4, boundary conditions must be imposed in 
addition. For instance, if the string is fixed at both ends, the corresponding 
boundary conditions will be 


u(0,t) = ul, t)=0, vt>0. (1.40) 


Definition 1.8. The boundary conditions (1.40) are called Dirichlet condi- 
tions (the values of the solution are imposed at the boundary of the compu- 
tational domain). When the imposed values are null, the boundary conditions 
are said to be homogeneous. 


Infinite string. We first consider the case of an infinite vibrating string. 


Exercise 1.4. Exact solution for the infinite string. Using the change of vari- 
ables (1.30), we define the function U(X,T) = u(x,t) and attempt to derive 
the exact solution: 





e Write 82v and 02,,u as functions of the derivatives of U. Derive the PDE 
satisfied by U. 

e Write this PDE for u = cy and B = —ca. Show that there exist two 
functions F(X) and G(T) such that U(X,T) = F(X) + G(T). 

e Conclude that the general solution of the wave equation can be written as 


u(x,t) = f(x — ct) + glz + ct). (1.41) 


e Using the initial conditions (1.39) show that 


_ 1 x+ct 
Uc) = es F = | u1(s)ds. (1.42) 


x—ct 
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A solution of this exercise is proposed in Sect. 1.3 at page 25. 


Domain of dependence, CFL condition. The expression (1.42) shows that the 
value of u(x,t) depends only on the initial values uo and uz restricted to the 
interval [x — ct,x + ct] (see Fig. 1.2). 


Definition 1.9. The lines of equations x — ct = E and x + ct = €, with € € R 
a given constant, are the characteristic curves of the wave equation (1.38). 


We now intend to use a numerical scheme to solve the wave equation. 
At time tn41 = (n + 1)ôt, the value of the solution u”*! at point x; = jor 
is defined by the information transported from the level t, along the two 
characteristics starting from the point (£j, tn+1) (see Fig. 1.2). The region 
located between the two characteristics is called the domain of dependence of 
the wave equation. 











x 
Di Clari 1 Tj Lit Li Cle 


Fig. 1.2. Domain of dependence for the wave equation. 





Exercise 1.5. Justify the following numerical scheme for the wave equation: 


1 — 1 
u US HU Uy — Que ty (1.43) 
542 = o | 


Show that this scheme is second-order accurate in time and space (use (1.10)) 
and that the stability (CFL) condition is the same as that found for the 
convection equation: 


o = až <1. (1.44) 


A solution of this exercise is proposed in Sect. 1.3 at page 26. 
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Exercise 1.6. Periodic initial conditions. Let us now assume that the initial 
conditions ug(x) and u(x) are periodic (with the same period 7). Show that 
the solution u(x,t) of the wave equation is periodic in space (with period 7) 
and in time (with period T/c). 


1. Justify the following algorithm: 
e for given initial conditions uo(x;) and u1(x;), compute ul, uj as 


us = tugs), uj = u, + OL Ui (25); (1.45) 


e forn > 1, compute 


ue = 2(1 — g ju, + a S Fuc- Me (1.46) 

2. Write a program to implement this algorithm. 

3. Test the program for a string of length @ = 1 and wave velocity c = 
2. The initial data are uo(x) = sin(27x) + sin(107x)/4 and u1(x) = 0, 
corresponding to a string initially at rest. What is the time period of the 
solution? 

4. Using nx = 50 points for the space discretization and nt = 50 points 
to discretize one period of time, superimpose on a single graph the exact 
and numerical solutions corresponding to one and two time periods. Verify 
that the numerical scheme preserves the periodicity of the solution. Same 
question for nx = 51. Comment on the results. 








A solution of this exercise is proposed in Sect. 1.3 at page 26. 


Finite-length vibrating string. Consider the wave equation (1.38) with 
initial conditions (1.39) and boundary conditions (1.40). We seek a solution 
of the following form (also called the Fourier or elementary waves expansion): 


u(x,t) = à dx(t)or(@), x(x) = sin (Fs) (1.47) 


kREN* 


For each wave @4, k is the wave number and wu; the wave amplitude. 


Exercise 1.7. 


1. Derive and solve the PDE satisfied by a function tp. 
2. Show that the exact solution for the finite-length vibrating string is 


wit) = [ares (Het) + mn (Fa a 


kEN* 


9 £ 


~ knc 


£ 
ne | nie. BD uy (2)bx(a)de. (1.49) 


l 


Find the time and space periods of the solution. 
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3. Write a program to solve the finite-length vibrating string problem, us- 
ing the centered scheme (1.43). Hint: start from the program previously 
implemented and modify the boundary conditions. 

Find the exact solution corresponding to the following initial conditions: 





uo(x) = sin (Fr) + “sin (1052) e Aine) = 0: (1.50) 


Plot the exact and numerical solutions at several times over one spatial 
period. Use the following numerical values: c = 2, £ = 1, nx = 50, nt = 
125. 


A solution of this exercise is proposed in Sect. 1.3 at page 27. 


1.2.3 The Heat Equation 


Diffusion phenomena such as molecular and heat diffusion can be described 
mathematically using the heat equation model 


Ou — K02,u = f(x,t), Vt> 0, (1.51) 


with the initial condition 
dr 0) = uo(z). (1.52) 


Example 1.4. The temperature 0 of a heated body is a solution of the equation 
010 — 0,(«0,0) = f(a, t), (1.53) 


where « is the thermal diffusivity of the material and the function f models 
the heat source. In a homogeneous medium, «x does not depend on the space 
position x and we retrieve the model equation (1.51). 





Consider the problem of a wall of thickness £, initially at uniform temper- 
ature 09 (the room temperature). At time t = 0, the outside temperature (at 
x = 0) suddenly rises to the value 0, > 09, which is afterwards maintained 
constant by an external heat source. The temperature at x = £ is kept at 
its initial value 4. The heat propagation within the wall is described by the 
heat equation (1.51), with the unknown u(x,t) = 0(x,t) — 0o, f(x,t) = 0, the 
initial condition ug(#) = 0, and Dirichlet boundary conditions 


u(0,t) = 0s — 0o = us, ull, t)=0, Ve = 0. (1.54) 


Infinite domain. For an infinitely thick wall (€ — oo) we look for a solution 
of the form 





ulx, t) = f(n), with = (1.55) 


2V Kt 
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Exercise 1.8. Show that the function f defined above satisfies the following 


PDE: ef if 
EAE = 
dn? ý Tan 


We introduce the following function, called the error function: 


0. (1.56) 


erf(z) = = [ eS dC, (1.57) 


which satisfies erf(0) = 0 and erf(oo) = 1. Find that the solution of the heat 
equation for £ — oo is 


u(z,t) = f — erf (=) u(0, t). (1.58) 


A solution of this exercise is proposed in Sect. 1.3 at page 28. 


Remark 1.3. A change in the value of the initial condition at point x = 0 has 
as consequence the modification of the solution everywhere in the domain. In 
other words, the perturbation introduced at x = 0 is instantaneously prop- 
agated in the computation domain (u(x,t) > 0,Vx in formula (1.58)). The 
propagation speed is said to be infinite. We can also prove that the solution 
at any point depends on all initial values uo(x). This implies that the domain 
of dependence for the heat equation is the whole domain of definition. We re- 
call, for comparison, that for the wave equation the domain of dependence is 
restricted to the area bounded by the characteristics and that the propagation 
speed of the solution is finite. 











Finite domain. For a wall of finite thickness £, the elementary waves expan- 
sion (1.47) is used. 


Exercise 1.9. Write and solve the equation satisfied by the functions wz in 
the case of the heat equation. Verify that the solution of the heat equation 
with boundary conditions (1.54) is 


u(x,t) = (1 — =) Us + D Ax EXP (E) “| klz). (1.59) 


kREN* 


Show that A, = —2u,/(k7). 
A solution of this exercise is proposed in Sect. 1.3 at page 29. 


Remark 1.4. Let us compare the exact solution (1.59) to the exact solution of 
the wave equation (1.48). The wave equation describes the transport in time of 
the initial condition. The amplitude of each spatial wave (x) oscillates over 
one time period without damping. The diffusion phenomenon described by 
the heat equation is characterized by a fast decrease in time of the amplitude 
of each wave (x) due to the presence of the exponential factor in (1.59). 
This smoothing effect of the heat operator increases as the wave number k 
becomes larger. 
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Exercise 1.10. Numerical solution. Consider first the following explicit cen- 
tered scheme for the heat equation: 


1 
ôt i 5x2 ey) 
The stability condition for this scheme is: 
ot 1 


1. Write a program to solve the problem of the heat propagation in a finite 
thickness wall. Set kK = 1, £ = 1, us = 1, and take nx = 50 discretiza- 
tion points in space. The time step ôt is calculated using (1.61). Plot the 
numerical solution for different times and compare to the exact solution 
(1.59). Compare also to the solution obtained for an infinite domain (1.58). 
Comment on the results for small t and then for large t. 

Hint: the exact solution is computed to a fair degree of approximation by 
considering the first 20 wave numbers k in the expansion (1.59). 

2. Smoothing effect. Run the previous program for us = 0 and the initial 

condition defined by 


uo(x) = u(x,0) = sin (5x) + “sin (1052) | (1.62) 


Compare the numerical solution to the exact solution (1.59). Comment 
on the results by comparing to those obtained for the wave equation with 
the same initial condition. Describe the damping of the waves defining the 
initial condition. 


A solution of this exercise is proposed in Sect. 1.3 at page 29. 


1.3 Solutions and Programs 


Solution of Exercises 1.1 and 1.2 (the Absorption Equation) 


1. Inserting u(t) = e~v(t) in (1.24) gives the differential equation v'(t) = 
e™ f(t) with the initial condition v(0) = u(0) = wo. It is easy to integrate 
this ODE to obtain the solution (1.25). 

2. If a = a(t) we obtain 


t 
u(t) = e7 Jo a(5)ds w= f _ fë ete F(a) | 


0 


3. Let us assume that the function f(t) = f is constant. The expression 


(1.25) becomes 
ult) = f +e (uo — £) i 
a a 
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If uo = f/a, the solution is constant: u(t) = ug, Vt. 
For a > 0, u(t) > f/a for t — co. 
For a < 0, u(t) — +00 x sign(uo — f/a). 

4. If a = ar +ia;, we obtain 


lae(t)| = le up] = |e (er Fag] = |e-artus] = 0. 


5. The script PDE_EulerExp.m (respectively PDE_RKutta4.m) implements 
the explicit Euler scheme (respectively the fourth-order Runge-Kutta 
scheme) to integrate the ODE u’(t) = f(t,u). The right-hand side f(t, u) 
is identified by the generic name fun inside these functions; the real name 
of this function is specified by the user when the functions are called. The 
two functions return a vector holding the numerical values ug, computed 
at discrete times uniformly distributed between tọ and tı. 








The MATLAB program PDE_absorption.m calls the functions PDE_EulerExp 
and PDE_RKutta4, sending as input argument the name PDE_absorption_source 
which represents the function implementing the right-hand side of the absorp- 
tion equation. The results are displayed in two separate figures (Fig. 1.3). The 
program also plots (Fig. 1.4) the bounds of the stability regions for the con- 
sidered numerical schemes (Exercise 1.2). 
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Fig. 1.3. Numerical solution of the ODE u’(t) + 4u(t) = 0, obtained using the 
explicit Euler scheme (left) and the fourth-order Runge-Kutta scheme (right). Solid 
line represents the exact solution. 


Let us comment on the results displayed in Fig. 1.3. Everything goes well 


for h = t Dne 5: the numerical solution approaches the exact solution 


with a better approximation for the Runge-Kutta scheme (in this last case 
the exact and numerical solutions are not distinguishable in the graph). On 
the other hand, for h = t = = the stability limit of the explicit Euler scheme 
is reached. The numerical solution remains bounded (which is no longer true 


when h > L, a case to be tested) but does not converge. 
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RKutta 4 





wr” RKutta 2 | 


Ka 








Fig. 1.4. stability region for different numerical schemes. 


The fourth-order Runge-Kutta (RKutta/) scheme has a wider stability 
region and remains stable for the two values of the discretization step h con- 
sidered in this computation. Figure 1.4 shows that the stability region of the 
RKuttai scheme includes both regions of stability of the explicit Euler scheme 
and RKutta2. 

In light of these results, the RKuttai scheme seems to be the best choice, 
offering the best stability and accuracy. In fact, the choice of one scheme or 
another for a practical application is motivated by a compromise between its 
characteristics (accuracy, stability) and its computational costs (the RKuttas 
scheme is approximately four times as expensive as the explicit Euler scheme 
for the same discretization step). 


Solution of Exercise 1.3 (the Convection Equation) 
1. The change of variables can be written as 
)=G a) C) 
T y H Le 

and it is one-to-one and onto (i.e. bijective) if au Æ Gy. Differentiation with 
respect to the new variables gives 

du = BOxU + u0OrU, ru = adxU + yOrU, (1.63) 
and we find that U is the solution of the PDE: 

(8 + ca) OxU + (u + c7y) OrU = 0. 


For @ = —ca, we get (u + cy) rU = 0, and, consequently, OrU = 0. This last 
equation has U(X,T) = F(X) as a solution, where F is an arbitrary smooth 
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function. Therefore we can put u(x,t) = F(X) = F(ax+ pt) = F(ax—act) = 
G(x-— ct), where G is again an arbitrary function. Finally, imposing the initial 
condition (1.29) we get u(x,t) = uo(x — ct). This implies in particular that 
the solution remains unchanged along the characteristic lines, that is 





(et) eC kaa) dk 2 Seat: 


In the plane (x,t), the characteristic curves C£ are straight lines of positive 
slopes 1/c (see Fig. 1.5). 

2. To derive the solution u(x,T") for x € [a,b] and T < T* = (b — a)/c, we 
draw the characteristics through the points (x, 1") and use the fact that the 
solution u(x,t) is constant along a characteristic line. Two cases are possible 
(see Fig. 1.5): 





e the characteristic (C¢ in the figure) crosses the segment |a, b|; this is the 
case for points located at x > xr, with xy = a + cT. The solution will be 
therefore determined by the initial condition: 





u(x, T) = uol) = uo(x — cT). 


e the characteristic (C, in the figure) does not cross the segment |a, b|; in 
this case a boundary condition is needed. If we impose u(a,t) = y(t), since 
the information will be searched through the characteristics back to this 
boundary condition, the solution is calculated as 





C 


u(z,T) =p) =» (7- =="), 


Note that the initial condition uo is completely “evacuated” from the do- 
main [a, b] after a time value T* = (b — a)/c. 


t 
C Ca C C 
TO | a £ b 
Maso ete eae a tees 
ty moar pel 2B) D i” Joe er ec 
u nE Tr b 


Fig. 1.5. Using characteristics to calculate the solution of the convection equation. 
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For c < 0, the boundary condition must be imposed on the right-hand side 
of the domain by setting u(b, t) = y(t). 
3. The function PDE_conv_exact_sol.m computes the exact solution for a given 
time T. Note in particular the use of the MATLAB command find to imple- 
ment the formula (1.33). 
4. We recognize a discretization of equation (1.28) where the time derivative 
du is approximated by DT u(t,) and the space derivative 0,u by D~u(a;). 
The choice of the upwind approximation for O,u is imposed by the fact that 
c > 0 and therefore the information comes from the left. 





Det 





L'j—1 Ti Tj+1 


Fig. 1.6. Geometrical interpretation of the CFL condition for the upwind scheme. 


The characteristic going through the point Us is drawn in Fig. 1.6. ‘This 
line cuts the horizontal line t = t, at a point P located between x;_; and 
£j, at a distance cot from x; and dx — côt from x;_,. Since the solution is 
constant along a characteristic, necessarily ue =u. 

It is interesting to note from these geometrical considerations that the 
scheme (1.36) is nothing else but a linear interpolation between the values of 


the solution at points x;_; and zj: 





j Ox uj gr Yi —1: 
The CFL condition côt < 6x can be thus regarded as a criterion imposing the 
positivity of the interpolation coefficients; in other words, the point P must 
lie within the interval [x,_:,x;|. 
Some computing tips may be useful at this point. First of all, we must be 
careful and make all array indices start from 1 (and not 0 as in mathematical 
expressions). Then, the solution at time t,:1 will be computed according to 


lig, OL cot côt 
U : a eee 


côt 
unt! = (1 — oju} + out, O = Pr 
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Since we seek a solution for n = N, we save storage memory by using a single 
array u for the calculation. If the solution is needed at intermediate times, it 
will be either written to a file or graphically displayed. With this programming 
trick, the previous relation becomes 


uj) = (1—o)u(g) + ou(ÿ — 1), 


and the values at time tn will be replaced by new values at time tn+1. We also 
must be careful to use in the numerical scheme the values u;—; before they 
are modified (i.e., at previous time tn). This is achieved by using an inverse 
loop (j =J+1,J,...,2); for j = 1 the boundary condition is imposed. 

This algorithm is implemented in the program PDE_convection.m. This 
program calls the functions PDE_conv_bound_cond and PDE_conv_init_cond, 
which define, in separate files, the initial condition and, respectively, the 
boundary condition. 

The solution for chosen intermediate times is represented in Fig. 1.7 and 
compared to the exact solution (computed by the function PDE_conv_exact). 
Note that starting from time 7* = L, the initial data leaves the computation 
domain, and the solution depends only on the boundary condition. 
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Fig. 1.7. Computation of the solution of the convection equation using the upwind 
scheme (CFL= 0.8). Solid line represents the exact solution. 
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An interesting phenomenon can be observed when one looks closer at the 
solution in the region where the derivative of the exact solution is discontin- 
uous (i.e., where the part of the solution depending on the initial condition 
is connected to that depending on the boundary condition). There is a nu- 
merical smoothing of this sharp transition. What is interesting here is that 
this observed dissipation has no physical meaning and is exclusively due to 
the numerical scheme. The upwind scheme is therefore said to be dissipative. 
We shall discuss in detail dissipation effects, in a more physical context, when 
analyzing the heat equation. 

A computation performed for ø = 1 (or CFL = 1) gives results that are 
perfectly superimposed on the exact solution. This is not surprising, since 
the upwind scheme becomes in this case an exact relation: u”t! = Uj 41 
In practice, the convection speed c and the space discretization step ôx are 
generally not constant, making impossible a computation with o = 1. 

The computation for øo = 1.1 illustrates the loss of stability of the upwind 
scheme when © > 1. 





Solution of Exercise 1.4 (the Wave Equation) 
Starting from (1.63), we obtain for the second derivatives 


Ou = 02 U + u22 U + 28p62.,U, 
Org = OX YU + Op U + 20707 xU, 


and conclude that U is solution of the PDE: 
(u? — y") DU + (3° — ca) OU AD (Bu — Cay) Ope =Q; 


For u = cy and 8 = —ca, the equation becomes —4c?ay0%-,-U = 0, 
which implies that 82 U = 0, or again O0r(0xU) = 0. From the previous 
relationship we infer that there exist two functions F(X) and G(T) such that 
U(X,T) = F(X) + G(T). Since X = a(x — ct) and T = y(x + ct), we can 
choose a = y = 1, and the solution becomes 


u(x,t) = f(x — ct) + glx + ct). 


Imposing initial conditions for u(x,t) and Œu(x,t) = —-cf'(x—ct)+cg'(x+ 
ct), we obtain the system of equations 


eee + g(x) = u(x), 
=f’ (x) + g(x) = (/cju(x), 


giving expressions for f’ and g’ and finally the formula (1.42) for u. 
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Solution of Exercise 1.5 


In the discretization proposed for the wave equation, the second partial deriva- 
tives 02,,u and du are approximated by centered finite differences (see equa- 
tion (1.10)). One can easily show that the scheme (1.43) is of second order in 
space and also in time. 

The stability condition (1.44) expresses that the domain of dependence (see 
Fig. 1.2) bounded by the two characteristics starting from the point (£j, tn+1) 
must lie inside the triangle {(%;,tn+1), (%;~1, tn), (€j41, tn) } defined by the 
three-point stencil used by the numerical scheme. In other words, if the time 
step is larger that the critical value dx/c, the information searched for by 
the characteristics will be found outside the interval [x;_1,x,11] used by the 
scheme. This is not in accord with the physical phenomenon described by the 
wave equation and therefore results in instability of the numerical scheme. 

From the formula (1.42), it can be easily checked that u(x +7T,t) = u(x,t) 
and u(x,t + T/c) = u(x,t). The solution u(x,t) is hence periodic in time and 
space, with period 7 in space and T/c in time. 











Solution of Exercise 1.6 


In the proposed algorithm, the scheme (1.43) is used together with the com- 
putation of the solution for the first time step based on the approximation 
Qrulx, t) x u(x). 

This algorithm is implemented in the program PDE_wave_infstring.m 
and the initial conditions in files PDE_wave_infstring_u0.m and, respectively, 
PDE_wave_infstring_ul.m. 

It is worth explaining some programming tricks used in this program. The 
periodicity condition (also satisfied by the initial condition wo(x) = uo(x +7) 
with 7 = 1!) is translated in discrete form by Ung+ı = u1, since the spatial 
discretization is built such that xı = 0 and %,,4 1 = 1. In order to fully exploit 
the capabilities of MATLAB in terms of vectorial programming, we define the 
arrays 7p and jm corresponding to indices 7 + 1, respectively 7 — 1, for all 
discretization points. The periodicity is expressed then by setting jp(nx) = 1, 
jm(1) = nx, and the numerical scheme (1.46) is written within a single line 
of code: 





u2=-u0+coeffxuli+sigma2*(ui(jm)+u1(jp)). 
Here u2 corresponds to the array (u"*1);, ul to (u");, and u0 to (u"-!),. 
The advantage of this compact programming is to avoid loops interrupted by 
specific treatments of the points on the boundaries. We shall use this simple 
programming tip in a more complicated project (Chap. 12). 

The numerical results are displayed in Fig. 1.8. The period of the solution 
in time is 1/c = 0.5. For nx = nt = 50, the CFL number is o = 1. The 
numerical scheme propagates the initial condition correctly and preserves the 
periodicity in time. The solution after one time period coincides with the exact 
solution (which is in fact the initial condition uo). Unlike the upwind scheme 
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used for the convection equation, this centered scheme does not generate any 
numerical diffusion (even for smaller nx corresponding to CFL < 1). For 
nx = 51 the scheme becomes unstable because CFL > 1. It is also interesting 
to note that the instability of the scheme appears only after some time (here 
after one time period). 
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Fig. 1.8. Numerical solution of the wave equation for the infinite vibrating string 
(periodicity conditions). Initial condition: u(x,0) = sin(27x) + sin(107a)/4 and 
dŒu(x,0) = 0. Comparison with the exact solution for CFL = 1 (top) and CFL 
> 1 (bottom) after one period in time (left) and two periods in time (right). 


Solution of Exercise 1.7 


The amplitude wt, satisfies the PDE 


d? kn \° 
a de + © (+) ti, = 0, (1.64) 


which is often encountered in physics. It models in particular the oscillations 
of a pendulum. The general solution of this PDE being 


k k 
üx(t) = Az cos (Fa) + B sin (Fa) , (1.65) 
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the expression (1.48) is straightforward. The coefficients Ag, By, are computed 
using the orthogonality of the trigonometric functions ¢, on [0,4]. 

We observe that the solution is periodic, of period 24 in space and 2¢/c in 
time. The initial condition (1.50) corresponds to a decomposition in elemen- 
tary waves with 


k= {1,10}, A, ={1,1/4}, By = {0,0}. 


The exact solution is given by (1.48) with these values. 

The program PDE_wave_fstring.m computes the solution for the finite- 
length string. The initial condition is computed in PDE_wave_fstring_in.m 
and the exact solution in PDE_wave_fstring_exact.m. 

Note that once again, we use vector notation (avoiding for loops) for the 
centered scheme. The computation points corresponding to the boundaries are 
not modified during the loop in time, and preserve their initial values (which 
respect the imposed boundary conditions). Figure 1.9 displays a comparison 
between the exact solution and the numerical solution for two different time 
instants. 





time=0.6 CFL=0.8 time=1 CFL=0.8 











— Exact sol. 
—e Num. sol. & 






















— Exact sol. 
—e— Num. Sol. 
































a 02 04 06 08 1 0 02 04 06 08 1 
X X 


Fig. 1.9. Numerical solution of the wave equation for the finite-length string. Initial 
condition: u(x,0) = sin(7x) + sin(10rx)/4 and dŒu(x,0) = 0. Comparison with the 
exact solution (one time period corresponds to t = 1). 


Solution of Exercise 1.8 


The partial derivatives can be written in terms of f as 
ðu  ndf u 1 df 
ðt  2tdn Ox?  4kt dn?’ 
hence the PDE (1.56) is satisfied by f. After integration, we obtain 


ue 


-a e? — u(x,t) = f(n) = B + Aerf(n). 
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Taking into account the properties of the function erf when imposing bound- 
ary conditions, we can easily obtain the formula (1.58). 
Solution of Exercise 1.9 


The amplitude tw, is solution of the ODE: 


dû, P kr \°. 4 
—— K == Uk = 
dt g pi ee 


and has the analytical form 


at) = Ave (= (EE) ne). 


We can easily check that any elementary wave tiz(t)@z(x) is a solution of the 
heat equation, but it does not satisfy the boundary conditions (1.54). This is 
the reason why a linear function of x (which is a also a solution of the heat 
equation) has been added to obtain the final form of the exact solution (1.59). 
We note that this is allowed by the linearity of the heat equation. 

Finally, the coefficients A, are calculated using the orthogonality of oz 


functions: 
Düz EA- a AET 2Us 
AR = ——, J G-)sn (Fe) a= -Z 


Solution of Exercise 1.10 








The MATLAB program PDE_heat.m answers questions 1 and 2. The ini- 
tial condition is computed in PDE_heat_u0.m and the exact solution in 
PDE_heat_uex.m (the erf function is already available in the standard MAT- 
LAB package). Numerical results (see Fig. 1.10) confirm the fact that the 
erf-solution (1.58) obtained for an infinite domain is a good approximation 
for small times t (this is the main reason why it is often used in practice by 
engineers). For longer times t, the exact solution (and hopefully the numerical 
one as well) converges to the steady-state solution (i.e., independent of time) 
u(x) = (1 — jus. 

The diffusion phenomenon described by the heat equation is characterized 
by a time scale tọ = ¢?/k (see the expression of 7 in (1.55)). Consequently, 
the effective speed of propagation co = £/to = K/£ of a thermal perturbation 
decreases with the distance to the source. This accounts for the poor efficiency 
of diffusion systems to propagate heat for large distances or time! 

Let us imagine a domestic heating system based on diffusion only. The 
thermal diffusivity of air being «x & 20 [mm7?/s], the heating effect will be felt 
at a distance of 1 cm after 5 seconds and at 1 meter after 5-104 s = 14 hours! 
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Fig. 1.10. Numerical solution of the heat equation for x € [0,4], k = 1, and bound- 
ary conditions u(0,t) = 1,u(£,t) = 0. Initial condition u(0,0) = 1,u(x,0) = 0,x > 0. 
Comparison with the exact solution (1.59) and the erf-solution (1.58) for an infinite 
domain. 


Fortunately, real heating systems are more efficient due to other phenomena 
(such as air convection and radiation). 

For the next question, it is easy to return to the previous program 
(PDE_heat.m) to implement the new initial condition (1.62). The lines to 
modify are written as comments. The results (see Fig. 1.11) clearly show that 
the wave of highest wave number, equivalent to highest frequency (k = 10 
in our case), is first damped. The solution tends to the constant steady-state 
solution u(x) = 0. Recall, for comparison, the behavior of the wave equation, 
for which the same initial condition was transported without damping of the 
wave amplitudes (see Fig. 1.9). 


Chapter References 


Extensive analysis of numerical methods for solving ODEs or PDEs can be 
found in a large number of books, ranging from the classic text by Richtmyer 
and Morton (1967), to Lambert (1973), John (1978), Mitchell and Griffiths 
(1980), Butcher (1987), and more recently Trefethen (1996). Introductions to 
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Fig. 1.11. Numerical solution of the heat equation for x € [0,4], k = 1, and 
boundary conditions u(0,t) = 0, u(£,t) = 0. Initial condition: u(x,0) = sin(rx) + 
sin(107a)/4. Comparison to the exact solution (1.59). Note the early damping of 
high frequency waves. 
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Nonlinear Differential Equations: Application 
to Chemical Kinetics 


Project Summary 


Level of difficulty: 1 


Keywords: Nonlinear system of differential equations, stability, 
integration schemes, Euler explicit scheme, Runge- 
Kutta scheme, delayed differential equation 





Application fields: Chemical kinetics, biology 


2.1 Physical Problem and Mathematical Modeling 


The laws governing chemical kinetics can be written as systems of ordinary 
differential equations. In the case of complex reactions with several different 
participating molecules, these equations are nonlinear and present interesting 
mathematical properties (stability, periodicity, bifurcation, etc.). The numer- 
ical solution of this type of system is a domain of study in itself with a flour- 
ishing literature. Very efficient numerical methods to solve systems of ODEs 
are implemented in MATLAB, as in most such software. The first model of 
reaction that we shall study in this chapter can be completely solved using 
such a standard package. We will therefore use the ode solvers provided by 
MATLAB , assuming that the user masters the underlying theory and the 
basic concepts such as convergence, stability, and precision (see Chap. 1). 

The other model includes a delay term. We choose here not to use the delay 
equation solver dde23 and describe a specific numerical treatment. Both are 
examples of models presented in Hairer, Norsett, and Wanner (1987). 

We first study the so-called Brusselator model, which involves six reactants 
and products A, B, D, E, X, and Y: 
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AN 
B+xX —>Y+D, bimolecular reaction. 
ICL SX trimolecular autocatalytic reaction, 


where v; are the constant chemical reaction rates. The concentrations of the 
species as functions of time t are denoted by A(t), B(t), D(t), E(t), X(t), 
and Y(t). Mass conservation in the chemical reactions leads to the following 
differential equations: 


A’ = —v Á, 
B'=-w%BX. 
D' = %BX. 
i= VaX, 


X'=vwuA-v%BX + vaX?Y — Va, 
Ps vg BX = va X?Y. 


We start by eliminating the two equations governing the production of species 
D and E, since they are independent of the four others: 


A= —v Á, 

B' = -%BX. 

AT V1 À = Vo BX ae vaX?Y = VAaX, 
yes vo BX = va X?’Y. 


The system can be furthermore simplified by assuming that A and B are kept 
constant and by taking all reactions rates equal to 1. The resulting system of 
two equations with two unknowns can be written as the initial value problem 


U'(t) = F(U (t)), 
f U(0) = Up = (Xo, Yo)”, F2 


where U(t) = (X(t), Y(t))* is the vector modeling the variations of concen- 
tration of substances X and Y, and 


A-(B+1)X4+ X*Y 
do ( es age i 


2.2 Stability of the System 


The stability of the system is its propensity to evolve toward a constant or 
steady solution. This steady solution U (t) = Ue, if it exists, satisfies U ’(t) = 0, 
and can therefore be calculated by solving F(U.) = 0. The solution U; is called 
a critical point. In the above example it is easy to compute: U, = (A, B/A)*. 
The stability of the system can also be regarded as its ability to relax in finite 
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time to the steady state when a perturbation A(t) = U(t) — Ue is applied 
to the solution. In order to study the influence of variations A(t), the right- 
hand side of the system is linearized around the critical point using a Taylor 
expansion: 


U'(t) = F(U) = F(U-) + VEyzu. (U — Ue) + O(\U — Vell”), 


where 
OF, OF; 
vra_| ax 97 |- 2XY -(B+1) X? 
OF OFS B-2xY =.) 
OX OY 


Assuming small variations A(t), the term O(|U —U,|°) can be neglected, 
leading to the linear differential system 


A'(t) = JA(t), 
i. (2.3) 





where the Jacobian matrix J = V Fy=y, is in this case 
B-1 A? 
Der 
In the case that J is diagonalizable, it can be decomposed as J = MDM”, 
where Dij = À;0;; and its integer powers are J”= MD"M~—! for n > 0. We 


recall the definition of the exponential of a matrix J (see for instance Allaire 
and Kaber (2006)): 


< 1 “1 “1 
c=) —J=N =MD'M'=MIS > =D" | M = Me? M, 
FT n! = n! = n! 


where e” is the diagonal matrix (e”);; = 6;;e*, formed out of the exponential 
of the eigenvalues of the matrix J. With this definition, the differential system 
(2.3) can be directly integrated: 


A(t) Se" Ay. (2.4) 


The long-time behavior is obtained by making t —> +oo in the exact solution 
(2.4) of the linearized system (2.3). If all eigenvalues À of J have negative real 
part, then eò + 0 as t — +00. Therefore the matrix et = Me?'M—! > 0 
as t + +o and the solution A(t) goes to 0. The Taylor expansion around the 
critical point is valid in this case, and the solution of the nonlinear system 
(2.2) tends toward the critical point. 

In this very simple example, the eigenvalues of the matrix J can be explic- 
itly calculated as the roots of the characteristic polynomial. The reader can 
easily verify that they are 
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B—-A?-1+4VA 
A+ = Se At with A = (4? — B + 1}? — 4ĄA°, 
and that their real part is negative if B < A? +1. 
The numerical method described in the following exercise can also be used to 
provide a stability criterion. This can be useful in a more general case when 
the eigenvalues cannot be calculated explicitly. 


Exercise 2.1. Write a program to display, as a function of B, the maximum 
of the real part of the eigenvalues of the matrix J. The parameter A is kept 
constant. Mark out the value of the stability criterion, which is the abscissa 
at which the curve crosses the horizontal axis. 

A solution of this exercise is proposed in Sect. 2.5 at page 41. 


In order to solve the system of differential equations (2.2), we could imple- 
ment one of the numerical integration schemes proposed in Table 1.1, Chap. 
1. Another possibility is to use already available programs, for instance one 
of the ODE solvers proposed in MATLAB. 


Exercise 2.2. Compute the approximated solutions for different choices of 
parameter À corresponding to stability and instability. In each case display 
graphically the solutions X and Y as a function of time and in another figure, 
Y as a function of X, that is, the parametric curve (X(t), Y (€))¢. 

A solution of this exercise is proposed in Sect. 2.5 at page 42. 


2.3 Model for the Maintained Reaction 


Consider now the system (2.1) with the hypothesis that component B is in- 
jected in the mixture at rate v. The concentration of B as a function of time 
is denoted by Z(t). The system of chemical reactions reduces to a new system 
of three equations: 


X'=A-(Z+1)X + X°Y, 
YX = XY, (2.5) 
Z'=-XZ +v. 


2.3.1 Existence of a Critical Point and Stability 


The problem (2.5) now admits a steady solution corresponding to the criti- 
cal point Ue = (A,v/A?,v/A)*. The Jacobian matrix of the right-hand-side 
function of the system (2.5) is 


EDEN Z2 eZ 
VF = X2? =X? 0 
= X aX 
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In order to study the stability of the system, this matrix is evaluated at the 
critical point 


HA T -al 


Exercise 2.3. Find numerical values of v corresponding to stable or unstable 
behavior of the system. Hint: the numerical method proposed in Exercise (2.1) 
can be used again. 

A solution of this exercise is proposed in Sect. 2.5 at page 43. 


2.3.2 Numerical Solution 


Exercise 2.4. Solve the system (2.5) numerically for the following values of 
v: 0.9, 1.3, and 1.52. 

For each case, display in separate figures the three concentrations versus time 
and concentrations Y and Z versus X. 

A solution of this exercise is proposed in Sect. 2.5 at page 44. 


2.4 Model of Reaction with a Delay Term 


An example of a more complicated chemical reaction is proposed in Hairer, 
Norsett, and Wanner (1987). An additional component 1 is introduced at a 
constant rate into the system, initiating a chain reaction. 


[——> yý —> Y, —> Yı —> Yı ——> 
l 2 3 4 
Z k, k3 k4 


The quantity of final product Y4 slows down the first step of the reaction 
Yı — Yə. A fine modeling of this process, taking into account the transport 
time and diffusion properties of molecules, leads to a delayed ODE system: 





yilt) = I — z(t)y (t), 
= a (t) Fa 
y3(t) = yo(t) — ya(t), 
y(t) = y3(t) — 0.5y4(t), (2.6) 
Ab). = 1 
( 1 + aya(t — tq)?’ 


where tg is the time delay parameter. This system has a critical point Y}, 
which is, once again, determined by solving y’(t) = 0: 
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I(1 + 8al) 
I 
Y. = : | (2.7) 


21 


As in the previous section, the system can be linearized around this point. 
The stability of the resulting system can then be studied by introducing a 
fifth variable y5(t) = y4(t — ta). The Jacobian of the right-hand-side function 


yı 
1 + aye 
1 


y 


Y2 — Y3 
y3 — 0.5y4 


can then be easily calculated at the critical point 


—z 0 0 0 12a1%z 
z —1 0 0 —12a1%3 
VF) = 0 1 —1 0 0 i 
0 0 1 —0.5 0 
where 
1 
a + 
1 + Sarl 


The small variations A(t) around the critical point Yo satisfy to a first-order 
approximation the linear system of ODEs 


Al (t) = —ZAz(t) + 12a 153 Au(t — ta), 

At (t) = ZA, (t) — Ao(t) — 12 17 Au(t — ta), (2.8) 
A(t) = Ao(t) — As(t), | 
Ai (t) = At) —0.5A4(t). 


An expression for A(t) is sought of the form A(t) = Ve”, where V is a 
constant vector of R*. Plugging this ansatz into (2.8) leads to the characteristic 
equation 


(x + 1)* (x + 0.5)(x + 7) + 12az/°xe “a = 0. (2.9) 
The corresponding system is stable if all roots have a negative real part. 


Exercise 2.5. Set a = 0.0005, tg = 4, and solve numerically the characteris- 
tic equation for x € C for different values of J between 0 and 20. Estimate a 
minimum value of the parameter J beyond which unstable equilibrium solu- 
tions can be obtained. 

A solution of this exercise is proposed in Sect. 2.5 at page 45. 
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In order to illustrate numerically the instability phenomenon, the full system 
(2.6) has to be integrated. Standard solvers for ODE cannot be used, since 
they assume the generic form 


y(t) = F(t, y(t)), 
es A (2.10) 


whereas in our case the right-hand side depends on the solution at a previous 
time, 





oo an 


In this example the function G is the vector function 
G:R~xR‘* x R* > R’, 
ui 
1 + av3 


ui 
7 + au u2 |. (12) 
U2 — U3 


U3 — 0.5u4 


(t, u,v) — G(t,u, v) = 


Numerical schemes well adapted to systems of standard type (2.10) have to 
be modified to handle the time delay. We start with the simplest case of the 
explicit Euler scheme: 
initialization: Yo = uo 
for =O mL do 
Yi+i = Yi + hF (ti, yi) 
end 


(2.13) 


This scheme provides a first-order approximation y, © y(tn), with tn = nh 
for h sufficiently small (see Butcher (1987)). It can be easily adapted to the 
system with delay term (2.11) if the time delay tą (here tą = 4 ) is an integer 
multiple of the time step h, i.e., tg = dh with d EN, 


initialization: Yo = uo 
for i=0,1,...,n—1 do 
n if i<d 
Vi = 


Yi-a elsewhere (2.14) 





Visa = Yi + hG (ti, Yi Yi) 
end 


Exercise 2.6. 1. The solution is supposed to be constant and equal to yo 
for all ¢ < 0. Write a function ODE_DelayEnzyme(t,Y,h,y0) returning 
the value G(t, y(t), y(t — ta)) given by (2.12). Hints: the values of y are 
discretized with the time step h = tmax/n and stored in an array Y. The 
delay parameter tq is a global variable set equal to 4 in the calling script. 
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2. Write a function ODE_EulerDelay (fdelay,tmax,nmax,y0) implementing 
the algorithm (2.14) to compute an approximation of the solution at 
time tmax in nmax time steps. The name of the right-hand-side function 
ODE DelayEnzyme is passed as input argument fdelay. 

3. Write a main script to integrate the system using the algorithm (2.14) 
up to tmax = 160. The value of a is fixed at 0.0005 and J is chosen in 
the range corresponding to instability. The initial condition up should be 
chosen close to the equilibrium solution Yo. Display graphically the four 
components of the solution versus time in one figure and the components 
Yi, = 2,...,4, versus the component yı in another figure. 

A solution of this exercise is proposed in Sect. 2.5 at page 46. 


In the case of a Runge-Kutta-type scheme, intermediate values needed for 
the computation of y;+ı must be stored. The standard fourth-order Runge- 
Kutta scheme presented in Chap. 1 will be adapted to our problem. We start 
by rewriting the scheme such as to compute explicitly the intermediate pa- 
rameters of the right-hand-side function instead of the values of the function 
itself: 





initialization: Yo = uo 
for 7=0,1,...,n—1 do 
cy 


h 
g° = yi F zF (tag), 


h 
g? = yi ie F9), 
(2.15) 
gd = yi + hF (ti, 9°), 


h h 


+2F (+ T +F (t +h g") ). 


end 


To adapt this algorithm to the case (2.11) the values g°, k = 1,...,4, should 
be stored as functions of time. They are needed to compute the intermediate 
values for the third input argument of the system function G, which holds the 
values of the solution at the delayed time t — tg. This leads to the following 
algorithm: 





2.5 Solutions and Programs 41 
initialization: Yo = uo 
for 2—=0,1,...,n—1 do 
1 
Ji = Yi; 


h 
g; = yi + z Olto EU D) 


h 
g9? = yi + z Olto I V) 
gt = yi + hG(ti, 93,73), 


i : (2.16) 
pu =U + 2 (Gt gh at) +26 (t+ 592-9? 


h 
42G (s + TE + Gt; + hat 7) : 


UO if i + Ck < d, 


h k — 
area 514 gE q Otherwise, 


with c=(0 0.5 05 1). 


end 


Exercise 2.7. 1. Write a function ODE DelayRungeKutta with input argu- 
ments fdelay,tmax,nmax, and yO implementing the algorithm (2.16) to 
compute an approximation of the solution at time tmax in nmax time steps. 

2. Compare graphically the solutions obtained using respectively the Euler 
and Runge-Kutta schemes. For tax = 16, plot the two solutions obtained 
using Nmax = 100 and Nmax = 1000. Compute the solution for nmax = 5000 
and store it as reference solution. Compute the error in L° norm, as a 
function of h, by performing several calculations for different values of 
Nmax Varying between 100 and 2000. 

3. Study the influence of the initial condition: plot the trajectories y;, 
i = 2,...,4, as functions of y, with different colors for different initial 
conditions. 

A solution of this exercise is proposed in Sect. 2.5 at page 46. 


2.5 Solutions and Programs 


Solution of Exercise 2.1 

To illustrate graphically the stability criterion, the script ODE_stab2comp.m 
computes the eigenvalues of the Jacobian matrix of F at the critical point, 
using the MATLAB built-in function eigen. The maximum value of the real 
part of these eigenvalues is computed and stored for different values of the 
parameter B, the parameter A remaining fixed. An approximation of the 
stability criterion is the abscissa where the maximum value of the real part 
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changes sign. 


Solution of Exercise 2.2 
The script ODE_Chemistry2.m uses the ODE solver ode45 available in MAT- 
LAB main distribution to integrate numerically the system of ODEs (2.2): 


global A 

global B 

fun=’ODE_fun2’ ; 

A=1; 

B=0.9; 

h 

U0=[2;1]; % Initial condition 

t0=0 ; % initial time 

t1=10; h final time 
[timeS1,solS1]=ode45 (fun, [t0O,t1i] ,U0); 


The above example corresponds to a stable case. The MATLAB function 
ode45 requires as input the following parameters: 


e the right-hand-side vector function of the differential system (written in 
ODE_fun2.m), 

e the time interval [t0,t1] over which the system is integrated, 

e the solution UO at initial time tO. 


It returns as output the array timeS1 of discrete intermediate times at which 
the solver has computed the corresponding solution sol1S1. 

The A and B parameters of the differential system are declared as global in the 
main script and in the right-hand-side function ODE_fun2. Therefore they do 
not need to be included in the list of input parameters of fun2 when calling 
ode45. We first run the script with parameters corresponding to stability 
(A = 1 and B = 0.9), then with parameters corresponding to instability 
(A = 1 and B = 3.2). In the first run case, the concentrations tend, for 
large times, to a constant value, which is the critical point. ‘This behavior 
is illustrated in Fig. 2.1. In the left figure, (a), the trend of concentrations 
versus time is represented, showing that they rapidly stabilize to their critical 
values. The right figure, (b), shows the behavior of component Y versus the 
component X for two different initial conditions. The two trajectories converge 
toward the same critical point of coordinates (A, B/A) = (1,0.9). 

The second choice of parameters, corresponding to instability, is illustrated 
in Fig. 2.2. In the left figure (a) the concentrations are displayed as functions 
of time. They remain bounded but exhibit periodic behavior. If the simulation 
is run over a long enough time, the graph of Y versus X represents the limit 
cycle. Figure 2.2 (b) numerically illustrates that this cycle does not depend 
on initial conditions but only on the parameters A and B. As they get closer 
to the instability limit (B = A? +1), the limit cycle becomes smaller and 
eventually collapses into the critical point. This phenomenon is called Hopf 
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bifurcation (see Hairer, Norsett, and Wanner (1987) for details). 


X and Y versus time Y versus X — stable case 
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Fig. 2.1. Simplified Brusselator model, stable case A = 1, B = 0.9. (a) Concentra- 
tions X and Y as a function of time. (b) Parametric curves (X,Y); for two different 
initial conditions, (2,1)’ and (0.5,0.5)*. 


Concentrations X and Y — unstable case Y versus X unstable case 
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Fig. 2.2. Simplified Brusselator model, unstable case A = 1, B = 3.2. (a) Con- 
centrations X and Y as a function of time. (b) Parametric curves (X, Y )+ for two 
different initial conditions, (2,1)* and (2,3)*. 


Solution of Exercise 2.3 

The script ODE_stab2comp.m that was written to answer Exercise 2.1 is mod- 
ified in order to find the values of the parameter v for which all eigenvalues 
of the Jacobian matrix J have a negative real part. From the figure displayed 
by the script ODE_stab3comp.m we find that only the first value v = 0.9 pro- 
posed in Exercise 2.4 corresponds to a stable case. For the values v = 1.3 and 
1.52 some of the eigenvalues have a positive real part. 
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Solution of Exercise 2.4 


The integration of the full system with three equations is performed in 
the script ODE_Chemistry3.m. The right-hand-side function is defined in 
ODE_fun3.m. For v = 0.9, the system is stable; therefore all three concen- 
trations tend toward their equilibrium value (U. = (1,0.9,0.9)7) as shown in 
Fig. 2.3 (a). The right figure (b), which shows the variations of Y as a function 


of X, also points out the convergence toward the critical point, starting from 
several different initial conditions. 


For v = 1.3, the system is unstable, but the concentrations remain 


bounded. Their variation as a function of time is periodic and tends toward 
a limit cycle as displayed in Fig. 2.4. 


X Y and Z stable case 
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Fig. 2.3. Brusselator model, stable case v = 0.9. (a) Concentrations X, Y, and Z as 


a function of time. (b) Parametric curves (X, Y )+ for two different initial conditions, 
(12,1) and,(2,2:2)”. 


Fig. 2.4. Brusselator model, unstable periodic case v 


X Y and Z unstable periodic case Y versus X for v=1.3 
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X, Y, and Z as a function of time. (b) Parametric curves (X,Y); for two different 
initial conditions, (1,2,1)’ and (2,2,2)*. 
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X Y and Z unstable divergent case Y versus X for v=1.52 
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Fig. 2.5. Brusselator model, unstable divergent case v = 1.52. (a) Concentrations 
X,Y, and Z as a function of time. (b) Parametric curves (X,Y); for two different 
initial conditions, (1,2,1)’ and (2,2,2)*. 


Eventually, for v > 1.5, the system is unstable and divergent and the val- 
ues of the concentrations y and z are unbounded for large times while the 
concentration x goes to 0. The global behavior is completely different from 
the previous case. In particular, there is no limit cycle of y as a function of x 
or of z as a function of x. 


Solution of Exercise 2.5 

This nonlinear equation can be solved numerically using the MATLAB built- 
in function fsolve, as proposed in the script ODE_StabDelay.m displayed 
below: 


clear important to reinitialize the Matlab square root of (-1) 
td=4; 
alpha=0.0005; 
for I=0:20 
bz=1/(1+8*alpha*I~3) ; 
funtext=’ (x+1)72*(x+0.5)*(x+bz)+12 *alpha*I~3*bz*x*exp(-td*x) ? ; 
funequi=inline(funtext,’x’,’bz’,’I’,’alpha’,’td’); 
guess=i/2; 
x0=fsolve(funequi, guess,optimset(’Display’,’off’) ,bz,I,alpha, td) ; 
fprintf(’?I=/f xO=ff+iff n°,1,real(x0) ,imag(x0) ) 
end 


Running the script with a real value as initial guess for the fsolve function 
(guess=2 for instance) will provide a negative real solution. This corresponds 
to a stable equilibrium, since deviations from the critical point decay expo- 
nentially to zero. Conversely, if we run the script with a pure imaginary initial 
guess, the root found by the solver is complex, with a nonzero imaginary part. 
Choose guess=i/2 as in the above example, and let J vary to obtain a solution 
with a real part that will be positive for values of J > 9. The equilibrium for 
this parameter choice is unstable, since the deviations increase exponentially. 
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On the other hand, stability is not ensured for J < 9, since not all the roots 
of equation (2.9) necessarily have a negative real part. 


Solution of Exercise 2.6 
The main script ODE_Enzyme.m calls the function ODE EulerDelay, which 
implements the Euler scheme (2.14) adapted to the delayed equation. The 
right-hand-side function G(t, y,y) is programmed in the file ODE_DelayEn- 
zyme.m. The selected value I=10 corresponds to an instability. The initial 
condition is fixed by adding a small deviation to the unstable equilibrium 
solution (2.7). In Fig. 2.6 (b) the trajectories are superimposed, which indi- 
cates the periodic character of the solution. The length of the period can be 
graphically estimated in Fig. 2.6 (a) to a value close to 13, which roughly cor- 
responds to one of the phases obtained in solving the characteristic equation 
(2.9). 

In contrast, setting I=5, which is a value for which no unstable linear equi- 
librium could be found by running the script ODE_StabDelay, we observe that 
the solutions tend to equilibrium. 
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Fig. 2.6. Solutions of the system (2.6) obtained using the Euler scheme. (a) y(t) 
versus t. (b) Trajectories y;, i = 2,...,4, versus yı. 


Solution of Exercise 2.7 

The delayed system of equations is now integrated with the fourth-order 
Runge-Kutta scheme programmed in the function ODE RungeKuttaDelay. 
Calls to ODE_EulerDelay should be replaced by ODE_RungeKuttaDelay in the 
script ODE_Enzyme.m. This is done by changing the assigment of the vari- 
able scheme. In order to implement the Runge-Kutta scheme for delayed 
ODEs (2.16), we introduce a triple-index array g(:,n,k) for k =1,...,4. It 
is used to store the four intermediate values (g“);_, as a function of time, 
so that it can be passed as input parameter to the right-hand-side function 
ODE DelayEnzyme. 
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A more detailed study of the convergence order of the two schemes is pro- 
posed in the script ODE_ErrorEnzyme.m. Reference solutions for each scheme 
are computed with a very fine discretization, here nmax=5000. They are used as 
exact solutions to evaluate the error on the solution at the final time tmax = 50 
when coarser discretizations are used. In Fig. 2.7, the variations of the error 
with the discretization parameter h are represented in logarithmic scale, along 
with the theoretical convergence orders O(h) and O(h*) for comparison. 
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Fig. 2.7. Error in L” norm as a function of the time step at time t = 50 for Euler 
and fourth-order Runge-Kutta schemes. 


Finally, the influence of the initial condition is investigated by the script 
ODE_EnzymeCondIni. We display in the same figure the trajectories starting 
from different initial conditions, randomly chosen in the vicinity of the un- 
stable equilibrium. Figure 2.8 shows that after an initial phase (of different 
lengths), they all converge to the same periodic trajectory. 


= Component y, (t) =. Component y(t) 




















(a) 0 (b) G 50 100 150 y. (t) 


Fig. 2.8. Solutions of system (2.6) obtained for four different initial conditions using 
the Runge-Kutta scheme. (a) Trajectories y2 versus y1. (b) y3 versus yı. 
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Polynomial Approximation 


Project Summary 


Level of difficulty: 1 


Keywords: Polynomial approximation, splines, best approxima- 
tions, interpolation 


Application fields: Approximation of functions 


This chapter is devoted to the approximation of a given real function by 
a simpler one that belongs, for example, to Pa, the set of polynomials of 
degree less than or equal to n. We also consider approximation by piecewise 
polynomial functions, that is, functions whose restrictions to some prescribed 
intervals are polynomials. The definitions and results of this chapter, given 
without proofs, are widely used in the rest of the book. We refer the reader to 
books on polynomial approximation theory, for instance Crouzeix and Mignot 
(1989), DeVore and Lorentz (1993), and Rivlin (1981). 


3.1 Introduction 


The approximation of a given function by a polynomial is an efficient tool in 
many problems arising in applied mathematics. In the following examples, f 
is the function to be approximated by a polynomial pp. The precise meaning 
of the word “approximated” will be explained later. 





1. Visualization of some computational results. Given the values of a function 
f and some points x;, we want to draw this function on the interval |a, b]. 
This is the interpolation problem if [a,b] C [min; x;, max; xi]; otherwise, it 
is an extrapolation problem. The following approximation is often made: 


Va € [a,b], FL Epa E): 


50 


2; 
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Numerical quadrature: to compute an integral involving the function f, 
the following approximation is used: 


J TORR J KOE 


since the computation of the last integral is easy. 

Differential equations: in spectral methods, the solution of an ordinary or 
partial differential equation is approximated by a polynomial. See Chap. 
5. 


To approximate f by pn € Pn means: 


1. 


Interpolation. The polynomial p, and the function f coincide at n + 1 
points £0,..., £n of the interval fa, b|. These points can be prescribed or 
be some unknowns of the problem. 

Best approximation. The polynomial pn is the element (or one element) 
of P, (if it exists) that is closest to f with respect to some given norm 
||.||. More precisely, 


y, n = ] f P e 
If — pnl a If — all 
If the norm is 


b 
kbs J peed 


the approximation is called least squares approximation or approximation 
in the L? sense or Hilbertian approximation. The norm of the uniform 
convergence (the supremum norm), which we denote by 


Il = sup lele), 
x€la,b] 


leads to the approximation in the uniform sense or approximation in the 
L sense or Chebyshev approximation. 


3.2 Polynomial Interpolation 


In this section f : [a,b] —> R is a continuous function, (24); a set of k+1 


distinct points in the interval |a, b|, and (Gi) a set of (k + 1) integers. We 
define n = k + ao +---+ ax. We are interested in the following problem: find 
a polynomial p that coincides with f and possibly with some derivatives of 
f at the points x;. The integers a; indicate the highest derivative of f to be 
interpolated at the point zi. 
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3.2.1 Lagrange Interpolation 


Lagrange polynomials correspond to the case that only the function f is in- 
terpolated and not its derivatives. In such a case a; = 0 for all à and thus 
k = n. We know from the theory of approximation the following important 
result. 


Theorem 3.1. Given (n + 1) distinct points xo,x%1,...,%n and a continuous 
function f, there exists a unique polynomial pn € Pn such that for all i = 
Dinos 


Pn(@i) = f (xi). (3.1) 


The polynomial pn is called the Lagrange polynomial interpolant of f with 
respect to the points x;. We denote it by Zn(f; £o, ..., £n) or simply Zn f. We 
define the characteristic Lagrange polynomials associated with the points x; 
as the n + 1 polynomials (4;);—o: 


l; € P, and CD) = 04; for j = 00; 
The Lagrange polynomials form a basis of P, and are explicitly given by 


n 
£ — T; 
p. = el 3.2 
(2) D e: A 
j=0,j£i 
The four Lagrange polynomials associated with the four points —1, 0, 1, and 
3 are displayed in Fig. 3.1. The Lagrange basis is mainly used to write in a 




















-2 0 2 4 


Fig. 3.1. Lagrange polynomials associated with the points —1, 0, 1, and 3. 


very simple way the Lagrange polynomial interpolant: 


52 3 Polynomial Approximation 


n 


i=0 
A question arises naturally: what is the most appropriate basis of P,, for the 
computation of Zn f? We compare three bases. 
e Basis 1. The canonical basis of the monomials 1, g,..., £”. 
e Basis 2. The basis given by the Lagrange polynomials. 
e Basis 3. The basis given by the polynomials 


1,(x — zo), (x — zo) (x — x1),...,(æ — zo) (£ — z1): (£ —xn_1). (3.4) 


Exercise 3.1. Computations in the canonical basis. 
Let (ax)}_, be the coefficients of Z, f in the canonical basis, 


n 

X | k 
Laf = Akt , 

k=0 





and a = (450) E€ RTE, 


1. Prove that the interpolation conditions (3.1) are equivalent to a linear 
system Aa = b, with matrix A € R(t) (+) and right-hand side b € 
R"*! to be determined. 

2. For n = 10 (and 20) define an array x of n + 1 random numbers sorted 
in increasing order between 0 and 1. Write a program that computes the 
matrix À. 

3. For f(x) = sin(10 x cos x), compute the coefficients of Z„ f by solving the 
linear system Aa = b. Plot on the same figure Z, f and f evaluated at the 
points x;. Use the MATLAB function polyval (warning: handle carefully 
the ordering of the coefficients a;). 

4. For n = 10, compute || Aa — 6||,, then the condition number of the matrix 
A (use the function cond) and its rank (use the function rank). Same 
questions for n = 20. Comment. 








A solution of this exercise is proposed in Sect. 3.6 at page 70. 


Exercise 3.2. For n going from 2 to 20 in steps of 2, compute the logarithm 
of the condition number of the matrix A (see the previous exercise) for n + 
1 points x; uniformly chosen between O and 1. Plot the logarithm of the 
condition number of the matrix as a function of n. Comment. 
A solution of this exercise is proposed in Sect. 3.6 at page 71. 





Exercise 3.3. Computations in the Lagrange basis. 

For n € {5, 10, 20}, define the points x; = i/n for i = 0,...,n. Write a program 
(using n and k as input data) that computes the coefficients of the Lagrange 
polynomial {,. Use the function polyfit. Evaluate /,,/2) at 0. Comment. 

A solution of this exercise is proposed in Sect. 3.6 at page 72. 
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Let us now consider Basis 3. This basis is related to what is called the divided 
difference. The divided difference of order k of the function f with respect to 
k +1 distinct points +0,...,xx is the real number denoted by f [xo,..., £k] 
and defined for k = 0 by f [a;] = f(x;) and for k > 1 by 





f toy... ymi] = Sete Pier ent 


The evaluation of the divided differences is computed by Newton’s algorithm, 
described by the following “tree”: 
flo] = f(xo) 

à Frot] = Le fees 


fleiı,x2]— f[x£0,x£1] 


fizi] = f (z1) ro, x1, £2] = = 


X fler, za] = Lease 


f [x2] = f (x2) 


The first column of the tree contains the divided differences of order 0, the 
second column contains the divided differences of order 1, and so on. The 
following proposition shows that the divided differences are the coefficients of 
Thaf in Basis 3: 





Proposition 3.1. 


n 


Tn f(æ)= fleo] + X. flxo,...,@a](x — £o)(£ — z1) -+ (£ — £k-1). (8.5) 


k=1 


Let c be an array that contains the divided differences c; = f[x£o,..., £i]. To 
evaluate the polynomial 7, f at a point x, we write 


Th f(x) = co + (x — zo){c1 + (x — z1) {co + ca (£ — 2) +-+. 


This way of writing Zn f(x) is called the Horner form of the polynomial. It is a 
very efficient method since the computation of 7, f(x) in this form requires n 
multiplications, n subtractions, and n additions, while the form (3.5) requires 
n(n + 1)/2 multiplications, n(n + 1)/2 subtractions, and n additions. Here is 
Horner’s algorithm for the evaluation of Zn f(x): 


yY = Cn 
for k=n—-1 X 0 

y = (£ — Tk )Y + CU 
end 
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Exercise 3.4. Divided differences. 


1. Write a program that computes the divided differences of order n of a 
function. Start from an array c that contains the (n + 1) values f {x;] = 
f(x;). In the first step, co = f [xo] is unchanged and all the other values 
Ck (k > 1) are replaced by the divided differences of order 1. In the second 
step, © = f £o, £1] is unchanged and the values cy, (k > 2) are replaced 
by new ones, and so on. Here is the algorithm to implement: 


for k=0 4n 
Ck + f (xx) 
end 
for p=l. n 
for k=nNp 
Ck < (Ck — Ch—1)/(Lk — Tk-p) 
end 
end 


2. Use Horner’s algorithm to evaluate 7, f on a fine grid of points in [0,1]. 
Draw 7, f and f on the same figure. In the same figures, mark the inter- 
polation points x;. 


A solution of this exercise is proposed in Sect. 3.6 at page 72. 


We consider now the problem of the control of the Lagrange interpolation 
error. Given x € [a,b], the goal is to evaluate the local error or pointwise 
error 


En(x) = f(x) — Inf (£). (3.6) 


Of course, if x is an interpolation point, there is no error, and e,(x) = 0. 
Actually, the error is precisely known through the following result. 


Proposition 3.2. Assume f € C"*'({a,b]). For all x € [a,b], there exists 
Er € la, b| such that 


1 


enla) = Gp in fre) (3.7) 


with I,(x) = [io (£ — di). 


For all x € [a,b], we deduce from (3.7) the upper bound 


1 
Mn 


a (n+1) 
(n +1)! f loo: 


len(a)| < 


loo | 
This suggests that a good way to choose the interpolation points consists in 
minimizing || Mnl, since the term || f("+!)||,, depends only on the function 


and not at all on the interpolation points. Suppose the interpolation points 
to be equidistant in the interval |a, b|: 
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b—a | 
Ti =a+1 O0 <3< n. 
n 





In this case, there exists a constant c independent of n such that for n large 
enough, 


= — „\ntl -n —5/2 | 
max, [In (æ)] >c(b—a)"™e "n (3.8) 





Consider now the Chebyshev points. These are the n zeros of the Chebyshev 
polynomial T, defined on the interval |—1,1] by 


T(t) = cosné, with cos 0 = t. (3.9) 
Hence the Chebyshev points are 


ti = cos(6;), Re 2 O<i<n—l. 
n 


On an interval [a,b], the Chebyshev points (see Fig. 3.2) are defined as the 
image of the previous points by the affine transformation y that maps [—1, 1] 
onto la, b]: 

a+b b-a 


PENE J + ; cos(0;), O<i<n-1. 








Whatever the points x; in [a,b], the following lower bound holds: 





n—1 


e-z) 


max 
x€[a,b] 
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Comparing the bounds in (3.8) and (3.10) favors the Chebyshev points. We 
will see in the next paragraph another reason to prefer these points to the 
equidistant points. 

We introduce the Lebesgue constantassociated with the points (x!) ,; it 
is the real number A,, defined by 





An = max > 1 VAE: (3.11) 


x€la,b] 


It is important to notice that A, does not depend on any function, but only 
on the points x;. Let us suppose an error €; for each value f(x;). Let 5, be 
the polynomial that interpolates the values J = fi + ci. The interpolation 
error at the point x is Zn f(x) — Pn (£) = — D _peili(x). If € = max; |e;| is the 
maximal error on the values f(x;), we derive the upper bound 





En f pales < cAn, 


which shows that the constant A, is a measure of the amplification of the 
error in the Lagrange interpolation process. In other words, it is the stability 
measure of the Lagrange interpolation. The following negative result holds. 


Proposition 3.3. Whatever the interpolation points, 


lim An = +00. (3:12) 


n— +00 


Hence small perturbations on the data (small £) can lead to very big variations 
in the solution (7, f). This is the typical case of an ill-conditioned problem. 





Exercise 3.5. Computation of the Lebesgue constant. 





1. Write a function that computes the Lebesgue constant associated with 
an array x of n real numbers (see (3.11)). Use the MATLAB functions 
polyval and polyfit to evaluate ¢;. Compute the maximum in (3.11) on 
a uniform grid of 100 points between min; x; and max; zi. 

2. The uniform case. Compute for n going from 10 to 30 in steps of 5 the 
Lebesgue constant Ay(n) associated with n + 1 equidistant points in the 
interval |—1, 1]. Draw the curve n +> In(Au(n)). Comment. 

3. The Chebyshev points case. Compute for n going from 10 to 20 in steps 
of 5 the Lebesgue constant Ar(n) associated with n+ 1 Chebyshev points 

n |—1,1]. Draw the curve lnn > Ar(n). Comment. 





A solution of this exercise is proposed in Sect. 3.6 at page 73. 
Concerning a uniform bound of the error (3.6), we have the following result. 


Proposition 3.4. For any continuous function f defined on |a, b], 


lenlo < (1+ An)En(F), 


with En(f) = infger,, |f —qalloo the error of best approximation of the function 
f by polynomials in Pa, in the uniform norm. 
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Remark 3.1. Hence the global error || f — Zn f| is bounded by the product 
of two terms. One of them is A, which always goes to +00; the other is 
En(f), whose rate of convergence toward 0 increases with the smoothness 
of f. Hence the Lagrange interpolation process converges uniformly if the 
product A, E,(f) goes to 0. 


Exercise 3.6. Compute and draw (on a uniform grid of 100 points) the La- 
grange polynomial interpolation of the function fı : x + |sin(rx)| at n Cheby- 
shev points of the interval |—1, 1]. Take n = 20, 30, then 40. Do the same for 
the function fə : x + x fix). Comment on the results. 

A solution of this exercise is proposed in Sect. 3.6 at page 75. 


Exercise 3.7. Runge phenomenon. 

Compute and draw on a uniform grid of 100 points the Lagrange polynomial 
interpolation of the function f : x + 1/(x? + a?) at the n + 1 points x; = 
—1+2i/n (i =0,...,n). Take a = 2/5 and n = 5, 10, then 15. Note that the 
function to be interpolated is very regular on R, in contrast to the functions 
considered in the previous exercise. Comment on the results. 

A solution of this exercise is proposed in Sect. 3.6 at page 76. 


3.2.2 Hermite Interpolation 


We assume in this section that the function f has derivatives of order a; at 
the point x;. In this case there exists a unique polynomial p,, € P,, such that 
for all à =0,...,k and 7 =0,..., Qi, 


roa IG.) (3.13) 
The polynomial p,, which we denote by Z, (f;%o,...,%x; @0,...,@x), Or sim- 


ply ZA f, is called the Hermite polynomial interpolation of f at the points z; 
with respect to the indices a;. 


Theorem 3.2. Suppose the function f is in C"*1([a,b]). For all x € [a,b], 


there exists €, € [min; x;, max; x;] such that 


rear (x) f° (Ex), (3.14) 


n (t) = f(x) Ti f(x) = 
with If (x) = HE Ce ea 


Since the function f is of class C”™! on the interval [a, b], for all n +1 distinct 
points £o,..., £n in the interval |a, b], there exists € € fa, b| such that 


Fo sn | Beas ile 


This relation defines a link between the divided differences and the derivatives. 
More precisely, we make the following remark. 
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Remark 3.2. Letting each x; go to x, we get an approximation of the nth 
derivative of f at x: 


— fn) = 

Joe Jim flro,...,ænl. 
This remark combined with the Newton algorithm allows the evaluation of 
the Hermite polynomial interpolation, as in the following example. 


Example 3.1. Compute the polynomial interpolant p of minimal degree satis- 
fying 
p(0) = —1, p(1) =0, p(1)=a ER. (3.15) 


Answer: compute the divided differences 


T0 =0 f [xo] =| -1 | 
- f [zo, £1] =| 1 | 


CS Teie X floz] = 1 


` FOIE 
Lol f [xı|=0 


p(x“) = [—1]+|1}2 ag (a = 1) k — 1). 


In these calculations, we wrote 


We get 


Lire, re) See ae a), 
and used the fact that f [x1,æ2] goes to f’(1) as € goes to 0. 


Exercise 3.8. In this exercise, f(x) = e * cos(3rx). 


1. Write a function based on the divided differences (as in Example 3.1) that 
computes the Hermite polynomial interpolant of a function (including 
the Lagrange case). The input data of this function are the interpolation 
points x;, and for each point, the maximal derivative a; to be interpolated 
at this point and the values f® (z;) for l = 0,...,@. 

2. Compute the Lagrange interpolation of f at the points 0, E, 7 and 1. 
Draw f and its polynomial interpolant on the interval [0, 1]. 

3. Compute the Hermite interpolant of f at the same points (with a; = 1). 
Draw f and its Hermite polynomial on the interval [0, 1]. Compare to the 
previous results. 

4. Answer the same questions in the case that f and f’ are interpolated at 
the previous points and, in addition, the point L, 


A solution of this exercise is proposed in Sect. 3.6 at page 76. 
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Exercise 3.9. Draw on [0,1], and for several values of m, the polynomial of 
minimal degree p such that 


p(0) = 0, p(1) = 1, and p“ (0) = p (1) =0, for 2=1,...,m. 


A solution of this exercise is proposed in Sect. 3.6 at page 77. 


3.3 Best Polynomial Approximation 


In this section, we look for a polynomial that is nearest to f for a prescribed 
norm ||.|| x, Æ% being a linear space that includes the polynomials. For f € #, 
we call a polynomial p,, € P, such that 


hs a 3.16 
If- prli = inf I- alle (3.16) 


a best polynomial approximation of f in Pn. The real number inf,ep,, || f -ally 
is called the best approximation error of f in Pa, in the norm ||. Ilẹ. We 
consider two spaces #. 


e Case 1. I = [a,b],# = C(I), the space of continuous functions equipped 
with the uniform norm, which we denote by ||.||... The best uniform ap- 
proximation error is denoted by 


En(f) = ing Ulf — lec 


e Case 2. I =]a,b|, X = L? (I), the space of measurable functions defined on 


I such that the integral i |f(x)l?dx is finite. L? (T) is equipped with the 
inner product and the norm 


(f,9) = | PORTE 


3.3.1 Best Uniform Approximation 


Here I = [a,b] and f € Æ = C(I). We seek a polynomial pn € Pn, solution of 
the problem 


|f pr = E,(f) = a = || eee 


The following definition enables the characterization of the polynomial of best 
uniform approximation. 


Definition 3.1. A continuous function y is said to be equioscillatory on n+1 
points of a real interval |a, b] if p takes alternately the values +||y||,, at (n+1) 
points zo < zı <-+++< £n of [a,b] (see Fig. 3.3). 


The following theorem is known as the alternation theorem. 
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Fig. 3.3. Example of an equioscillatory function. 


Theorem 3.3. Let f be a continuous function defined on I = {a,b]. The 
polynomial pn of best uniform approximation of f in Pn is the only polynomial 
in P, for which the function f —p, is equioscillatory on (at least) n+2 distinct 
points of I. 


For example, the best uniform approximation of a continuous function f on 
la, b] by constants is 


1 
Po= 5 f min f(x) + us ij ©} 
and there exist (at least) two points where the function f — po equioscillates. 
These points are the two points where the continuous function reaches its 
extremal values on fa, b]. 

Hence, to determine the best uniform approximation of a function f, it 
is sufficient to find a polynomial p € P, and n+ 2 points such that f — p 
equioscillates at these points. This is what the following algorithm (called the 
Remez algorithm) does. 


The Remez algorithm 


1. Initialization. Choose any n + 2 distinct points x9 < af < --- < 2511. 
2. Step k. Suppose the n + 2 points zë < af <. < Ca are known. 
Compute a polynomial px € P, (see Exercise 3.10) such that 


FE) = peat) = DHE — ele}, G1. yn. 
(a) If 


lf — Palloc = IF (27) — pe(a?)|,  i=0,...,n+1, (3.17) 
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the algorithm stops, since the function f — pz, equioscillates at these 
points. Hence p; is the polynomial of best uniform approximation of 
f. 

(b) Otherwise, there exists y € |a, b| such that for all i = 0,...,n +1, 


lf — plam |fY) — pape (3.18) 


Design a new set of points gar < gooi 4 PA by replacing 


one of the points x” by y in such a way that 


Ue Hoa) ie nt net, Fa Tra E 


Exercise 3.10. Prove the existence of a unique polynomial px € Pn defined 
in step k of the Remez algorithm. Program a function that computes this 
polynomial (the input data are the n + 2 points x; and a function f). 

Hint: write p(t) = Dr a;t? and use MATLAB to solve the linear system 


whose solution is (ao,..., an)? 
A solution of this exercise is proposed in Sect. 3.6 at page 78. 





Exercise 3.11. Remez algorithm. 

The goal is to compute the best uniform approximation of the function x + 
sin(27 cos(rx)) on [0,1] by the Remez algorithm. Discuss all the possible cases 
in point (b) (see the algorithm): y < min; £i, y > max; Ti, Y € ae os , and 
(FE) — pe @$))(F(W) — pe(y)) > 0 or (FE) — pe(ak)) (y) — pely)) < 0. To 
check the inequality (3.18): 


e compute || f — pz||.. on a uniform grid of 100 points in the interval [0, 1], 

e The equality (3.17) of the algorithm is supposed true if the absolute value 
of the difference between the two quantities is larger than a prescribed 
tolerance (1078 for example). 





Compare the results (in terms of the number of iterations required for the 
convergence of the algorithm) for the three choices of initialization points: 


e equidistant points: 7; = ee 1=0,...,n+1; 

e Chebyshev points: x; = $(1 — cos), 1 =0,...,n +1; 

e random points: the x; are n+ 1 points given by the function rand then 
sorted out. 


A solution of this exercise is proposed in Sect. 3.6 at page 79. 


3.3.2 Best Hilbertian Approximation 


Here I = |—1,1| since every interval |a,b| can be mapped to I by a sim- 
ple affine transformation. The Hilbertian structure of Æ = L?(I) extends 
to this infinite-dimensional space some very usual notions such as basis and 
orthogonal projection. See, for example, Schwartz (1980) for the definitions 
and results of this section. We are interested in the determination of the best 
approximation of a function in L?(I) by polynomials of a prescribed degree. 
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Hilbertian Basis 


The Legendre polynomials are defined by the recurrence relation (see also 
Chap. 5) 


(n +1)Lh41(x) = (2n +1)xL,(x) — nlh-1(x) (Vn > 1) 


with Lo(x) = 1 and L(x) = x. The degree of Lẹ, is n, and for all integers n 
and m, 

_ JO ifn £m, 
En; Lm) = PO eer 


These polynomials are said to be orthogonal. We display in Fig. 3.4 some 
Legendre polynomials. The family L* = L,/||L,| forms a Hilbertian basis of 



































Fig. 3.4. Example of orthogonal polynomials: the Legendre polynomials. 


L?(T), that is, (L¥ )n>o is orthonormal and the set of all finite linear combina- 
tions of the L* is dense in L?(1). As in finite dimension, we can expand every 
function in L?(T) in the (infinite) Legendre basis. 





Theorem 3.4. Let f € L?(I) andn €N. 


1. f has a Legendre expansion: 1.e., there exist real numbers Ty such that 
CO 
f= > file 3.19 
k=0 


2. There exists a unique polynomial in P, (which we denote by nnf) of best 
Hilbertian approximation of f in Py, i.e., 
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x f 


eme) 





Fig. 3.5. The best Hilbertian approximation of f is its orthogonal projection on 
Pee 


— mafil = inf |f- qll. 
If- mafl = inf |f -al 


Moreover, nnf is characterized by the orthogonality relations (see Fig. 
SO) 
(f—Tnf,p)=0, VWwePrn, (3.20) 


which means that 7, f is the orthogonal projection of f on Py. 
The real numbers fy in (3.19) are called the Legendre (or Fourier-Legendre) 


coefficients of the function f. We deduce from the orthogonality of the Leg- 
endre polynomials that 





s L 1, f? 
f= LAM ea dy f roma (3.21) 
Le | ei 
and 7, f is the Legendre series of f, truncated to the order n: 
UD DR l (3.22) 
k=0 


The computation of the best approximation of a function consists mainly in 
computing its Legendre coefficients. Since the integral in (3.21) can rarely be 
evaluated exactly, a numerical quadrature is required. See Chap. 5, where the 
Legendre polynomials are also used to solve a differential equation. 

The convergence of the best Hilbertian approximation is stated in the 
following proposition. 


Proposition 3.5. For all f € L?(T), 


„5m If mn fl = 0. (3.23) 
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3.3.3 Discrete Least Squares Approximation 


In this section, we seek the best polynomial approximation of a function f 
with respect to a discrete norm. Given m distinct points (x;)7*, and m values 
(y;),, the goal is to determine a polynomial p = DE ajx? € Ph-1 that 
minimizes the expression 


E = >. ui — p(x)”, (3.24) 


with m, in general, much larger than n. From a geometrical point of view, 
the problem is to find p such that its graph is as close as possible (in the 
Euclidean norm sense) to the points (x;,y;). The function E defined by (3.24) 
is a function of n variables (ao, a1,...,@,—1). To determine its minimum, we 
compute the partial derivatives 


be n— 1 m rA 
me u-z D (Seat) oe Een 
: k=0 \i=1 = 


i=1 





Hence the vector a = (ag,...,@,—1)* whose components are the coefficients 
of the polynomial where the minimum of F is reached is a solution of the 
linear system 





Aa = b, (3:25) 
with the matrix À and the right-hand side b defined by 


8 


~ i ; 4 s 


First of all, consider the case n = 2, corresponding to the determination 
of a straight line called the regression line. In this case the matrix A and the 


vector b are 
(g pa A) om 
The determinant of A, 
m m 2 m m 
A=m ($) — (Z) = mi Di Ya; 
i=1 i=1 


vanishes only if all the points x; are identical. Hence the matrix À is invertible 
and the system (3.26) has a unique solution. 
Let us go back to the general case. Noticing that the Vandermonde matrix 
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n—1 
L 2 shy We 

n—1 
Ly ia Le 


is such that À = AT A and b = ATb with b = (Yi, ---, Ym)! , we can write the 
system (3.25) as 
AT Aa = A’ b. (3.27) 


These equations are called normal equations. The following theorem tells us 
that the solutions of (3.27) are the solutions of the minimization problem: find 
a € IR” such that 

|| Aa — b|| = inf, || Ax — b||. (3.28) 


Theorem 3.5. A vector a € R” is solution of the normal equations (3.27) if 
and only if a is solution of the minimization problem (3.28). 


Hence to solve the least squares problem, one can either solve the problem 
(3.28) by some optimization algorithms, or solve the problem (3.27) by some 
linear system solvers. See Allaire and Kaber (2006), for instance. 

To compute a polynomial least squares approximation with MATLAB, use 
the instruction polyfit(x,y,n) with x a vector that contains the values x;, y 
a vector that contains the y;, and n the degree of the least squares polynomial. 


Exercise 3.12. Compute the least squares approximation of the function 
f(x) = sin(27 cos(rx)) defined in Exercise 3.11. The optimal degree n could 
be determined in the following way. Starting from n = 0, one increases n in 
steps of 1 until the relative error |en — en—1|/en—1 becomes smaller than a 
prescribed value (5 for example). Here we set en = ||z — pn(x)]lo- 

A solution of this exercise is proposed in Sect. 3.6 at page 80. 





3.4 Piecewise Polynomial Approximation 


We display in Fig. 3.6 some Lagrange polynomial interpolants of the function 


f, defined on [0,1] by 


1 for 0 < x < 0.25, 
f(x) = 4 2 — 4x for 0.25 < z < 0.5, 
0 for DESERT, 


at respectively 4, 6, 8, and 10 points. Obviously, there is a problem due to 
the lack of global regularity of f over the interval J = [0,1]. However, this 
function has a very simple structure; it is affine on each interval [0, 4], [4, à 
and |1/2, 1]. 

Let f be a continuous function defined on the interval [0,1]. The goal is to 
approximate f by a piecewise polynomial function S. Such a function is called 
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Fig. 3.6. Polynomial interpolation of a piecewise polynomial function. 


a spline. The use of piecewise polynomials is a way to control the problems 
related to the lack of global regularity of f. Another practical reason is the 
stability of the numerical computations: it is better to use several polynomials 
of low degree than one polynomial with high degree. 

The interval 7 = [0,1] is divided into subintervals 1; = [x;,x;11] for à = 
1,...,n — 1. On each subinterval J;, the function f is approximated by a 
polynomial p, ; of degree k. We denote by Sẹ the piecewise polynomial that 
coincides with p; on each interval J; and satisfies some global regularity 
condition on the internal I: continuity, differentiability up to some order, etc. 





3.4.1 Piecewise Constant Approximation 


Let So be a function that is constant on each interval J; and interpolates f at 
the points 2441/2 = (£i + %41)/2: 


Soyr, (@) = f (£i+1/2). 


Suppose the function f is in C(I). According to Proposition 3.2, for all x € L, 
there exists €, € l; such that 


i (2) = Sola) = GG) TF Ges): 


We deduce from this that if the points x; are equidistant (x;11—2%; = h = 1/n) 
then 


h 
lf — Sole < 5 M1, (3.29) 
with Mı an upper bound of f’ on J. Hence, as h goes to 0, Sp converges 
uniformly toward f. 
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Fig. 3.7. From top to bottom: examples of piecewise constant, affine, and cubic 
approximations. 


Remark 3.3. The power of h in (3.29) indicates that if the discretization pa- 
rameter h is divided by a constant c > 0, the bound on the error | f — Sollo 
is divided by the same constant c. 


Exercise 3.13. Let f : [0,1] => f(x) = sin(4rx). Draw the curve Inn + 
In || f — Sol and check (an approximation of) the estimate (3.29). Take the 
values n = 10k with k = 1,...,10. 

A solution of this exercise is proposed in Sect. 3.6 at page 80. 


3.4.2 Piecewise Affine Approximation 


This time, the approximation Sı is affine on each interval J; and coincides 
with f at the points x; and £i+1: 
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Fra) fi) 


Sir, (x) = h 


C 
First of all, suppose the function f is in C? (I). According to Proposition 3.2, 
for all x € 1;, there exists €,; € 1; such that 


f(x) — Si(x) = Le mT a 


We deduce from this 


h2 
ld les g M; (3.30) 
with Mə an upper bound of f” on J. Hence the uniform convergence of S4 
toward f. 


Remark 3.4. The power of h in (3.30) indicates that if the discretization pa- 
rameter h is divided by a constant c > 0, the bound on the error || f — Sillo 
is divided by c?. For example, changing h into h/2 divides the bound on the 
error by 4. 


Exercise 3.14. Same questions as in the previous exercise to check the esti- 
mate (3.30). 
A solution of this exercise is proposed in Sect. 3.6 at page 82. 


If the function f is only in Ct, convergence holds too. To prove it, write f(x) 
as an integral, 





Flo) = fei) + f sat, and Sila) = fle) +5 | T Oat 


and use the assumed bound on f’, 
If = Sille < 2hMı. 


That implies the convergence. Note that this estimate is less accurate than 
(3.30), but it requires less regularity on the function f. 
3.4.3 Piecewise Cubic Approximation 


Now we seek an approximation $3 in C?(I) that is cubic on each interval J; 
and coincides with f at the points x; and x; 1. Let p; be the restriction of S3 
to the interval 1;, for à =0,...,n—1: 


pilx) = a;l£ — pe + b;(x — riy + clx — zi) + di. 





Obviously d; = f(x;). The unknowns a;, b;, and c; can be expressed in terms 
of the values of f and its second derivative at the points x;. Setting a; = 
p; (xi) and using the continuity of the first and second derivatives of the 
approximation at the points x;, we get for i = 0,...,n — 1, 
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1 Qi+1 — Qi _ fiy fi 2ai+ iig, 
b= pn =, Gg = — - —— 


2 6h h 6 


and a recurrence relation between the values a;_1, a;, and Qi+1: 


6 
h(aj—1 + 40; + aj41) = z fi- — 2 fi + fi41). 


We have to add to these n — 1 equations two other equations in order to close 
the system and compute the n + 1 unknowns a;. Several choices of these two 
equations exist. If ao and a, are fixed, say 


Ag = An = 0, (3.31) 
the vector a = (Q1,...,Qn—1)* is a solution of the linear tridiagonal system 
Ax = b, with 

410...0 fo—2f1 + fo 
i ar: : 
AE TE 4e ot a Di Ji-1 — 2j; + fitr |: (3.32) 
RE daa | 
0...0 14 Jaz = 2]Jn Fin 


The matrix A is invertible since its diagonal is strictly dominant. 


Exercise 3.15. Write a program that computes the cubic spline with the 
conditions (3.31) and the n + 1 points (i/n):_ọ. Test your program with the 
function f(x) = sin(4rx). Take n = 5, then n = 10. Draw on the same plot the 
function f and the spline. In order to see the behavior of the spline between 
two interpolation points, add ten or twenty points of representation in each 
interval 1; to get a very fine plot. 

A solution of this exercise is proposed in Sect. 3.6 at page 82. 





3.5 Further Reading 


For the general theory of polynomial approximation, we refer the reader to 
Rivlin (1981) and DeVore and Lorentz (1993). 

The Legendre polynomials are used in Chap. 5 to solve a differential equa- 
tion. We refer the reader to Bernardi and Maday (1997) for the use of spectral 
methods in numerical analysis. 

Related to the splines are the Bézier curves, which have many applications 
in computer-aided geometric design; see Chap. 9. 

Wavelets are used in Chap. 6 for image processing purposes. See Cohen 
(2003) for the numerical analysis of wavelets. 

We did not consider in this chapter trigonometric approximation. Of 
course, all the results stated here are valid with very minor modifications 
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to the approximation of periodic functions: existence and uniqueness of a best 
polynomial approximation, interpolation, etc. ‘There exists a very efficient al- 
gorithm to compute the Fourier coefficients of a function from its pointwise 
values or the reverse; it is the famous fast Fourier transform. In Chap. 12, the 
trigonometric approximation is used to solve the Navier-Stokes equations. 


3.6 Solutions and Programs 


Solution of Exercise 3.1 


D. a 20s. nae See 
z Gr By. so ar 

1. DS (Do) ered Ga) and A = | 
l: Fy A Ln 


2. n=10;x=sort(rand(n+1,1)); 
A=ones (length(x) ,1); 
for k=1:length(x)-1 

A=[A x.^k]; 
end; 


cf=A\test1 (x); 

Areordering of the coefficients 
cf=cf(end:-1:1); 
y=polyval(cf,x); 
plot(x,test1(x),x,y,’? r+’); 


the function test1 is defined by 
testi=inline(’sin(10.*x.*cos(x))’); 


4. Ais a Vandermonde matrix, it is invertible if all the points are distinct. 
That is the case in this experiment. However, for MATLAB, Aa — b is 
not zero. This is due to the very bad condition number of the matrix 
A. For large values of n (say n = 20), the matrix A becomes a singular 
matrix for MATLAB: the numerical rank of the matrix A computed by 
MATLAB is 18, while the right one is n + 1. Recall that the condition 
number of a matrix A measures the sensitivity of linear system Ax = b to 
perturbations of the data A or b. 





See the script in the file APP_ApproxScript1.m. 
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Solution of Exercise 3.2 


The following script is written in the file APP_ApproxScript2.m. It uses the 
function APP_condVanderMonde defined below: 





Condition number of a Vandermonde matrix 
N=2:2:20;cd=L[ |; 
for n=N 
cd=[cd APP_condVanderMonde (n) | ; 
end; 
plot(N,log(cd),?+-°) 


The function APP_condVanderMonde is defined as below: 


function y=APP_condVanderMonde (n) 
compute the condition number of a Vandermonde matrix 
The n+1 points are uniformly chosen between O and 1. 
x=(O:n)’/n; 
A=ones (length(x) ,1); 
for k=1:length(x)-1 
A=[A x.7k]; 
end; 
y=cond (A) ; 


We deduce from Fig. 3.8 that In(cond(A)), as a function of n, is a straight 
line. Hence cond(A) grows exponentially with n. 





40 


30! 


20; 














2 4 6 8 10 12 14 16 18 20 


Fig. 3.8. Logarithm of the condition number of a Vandermonde matrix. 


The reader is asked to compare the function APP_condVanderMonde to the 
following function in terms of numerical complexity 


function c=APP_condVanderMondeBis (a) 
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Acompute the condition number of a Vandermonde matrix 
The n+1 points are uniformly chosen between O and 1. 
x=(O:n)’/n; 
A=ones (length(x) ,1) ;y=x; 
for k=1:length(x)-1 
A=TA yl sy=y.*x; 
end; 
c=cond(A) ; 


Solution of Exercise 3.3 


The Lagrange basis 

n=10;x=(0:n) ’/n;i=round(n/2) ; 
twon=n;g=(0:twon)’/twon; 
y=zeros(size(x)) ;y(i)=1; cf=polyfit(x,y,n) ; 
yO=polyval (cf ,0); 


For n = 5 or 10 everything goes well, since the program computes a value 
for £,,/2(%0) close to the exact value 0. But for n = 20, the computation of 
£,/2(%o) gives —1.0460. This is again a consequence of the ill conditioning of 
the matrix. Note that in that case, MATLAB displays a warning message. See 
the script in APP_ApproxScript8.m. 


Solution of Exercise 3.4 





1. The function APP_dd defined below computes the divided differences. 


function c=APP_dd(x) 
h x contains the points xi 
% c contains the divided differences 
c=testi(x); warning: "testi" is defined 
either in another file or "inline" 
n=length(x) ; 
for p=1:n-1 
for k=n:-1:p+1 
c(k)=(c(k)-c(k-1))/(x(k)-x(k-p)) ; 
end; 
end; 


It is sometimes useful to send the name of a function as an input parameter 
of APP_dd. This is possible up to a slight modification of the first two lines: 
see also section 5.6 of Chap. 5. 


function c=APP_dd(x,f) 
c=feval (f,x); 
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2. Running the script below produces Fig. 3.9. This script is available un- 
der the name APP_ApproxScript{.m as well as the functions APP_dd and 
APP_interpol: 


function y=APP_interpol(c,x,g) 
Acompute the interpolation of the function f on the grid g 
Aknowing the divided differences c computed at the points x 
n=length(c) ; 
y=c(n)*ones(size(g) ); 
for k=n-1:-1:1 

y=c (k)+y.*(g-x(k)) ; 


end; 


n=20;x=(0:n)’/n;g=0:0.01:1; 

c=APP_dd(x) ;y=APP_interpol(c,x,g); 
yg=test1(g) ;plot(g,yg,g,y,’rt’) 

hold on;yx=test1 (x) ;plot(x,yx,’0’);hold off 
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Fig. 3.9. Computation of a Lagrange interpolant. 


Solution of Exercise 3.5 


The script of this exercise is available under the name APP_ApproxScripti.m 
as well as the function APP_Lebesgue. 





1. Computation of the Lebesgue constant: 


function leb=APP_Lebesgue (x) 

AComputation of the Lebesgue constant related 
Ato the points in the array x 

n=length(x)-1; 
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xx=linspace(min(x) ,max(x) ,100) ; 

Afine grid of 100 points 

y=zeros (size (xx) ) ; 

for i=1:n+1; 
Acomputation of ¢_i(x) 
1l=zeros(size(x));1(i)=1;cf=polyfit(x,1,n); 
y=ytabs (polyval (cf ,xx)); 

end; 

leb=max (y); 


2. The uniform case: 


ind=[ ];lebE=[ ]; 

for n=10:5:30 
x=(-n/2:n/2)/n*2; equidistant points 
1=APP_lebesgue (x) 
ind=[ind;n] ;lebE=[lebE;1] ; 

end; 

figure(1) ;plot(ind,log(lebE) ,’+-’) 





Lebesgue constant (uniform case) 














0 14 18 22 26 30 


Fig. 3.10. The Lebesgue constant associated with equidistant points: n + In(A(n)). 





We note in Fig. 3.10 that In(A(n)), as a function of n, is a straight line 
with slope (approximately) +. Hence A(n) & e”/?. 
3. The Chebyshev case: 


ind=[ ];lebT=[ ]; 

for n=10:5:30 
x=cos(pi*(.5+n:-1:0)/(nt+1)); ~Chebyshev points 
1=APP_lebesgue (x); 
ind=[ind;n] ;lebT=[lebT;1] ; 
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end; 
figure(2) ;plot(log(ind) ,lebT, ’+-’) 





Lebesgue constant (Chebyshev points) 
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Fig. 3.11. The Lebesgue constant associated with the Chebyshev points: Inn + 
A(n). 





We note in Fig. 3.11 that e47), as a function of n, is a straight line with 
slope (approximately) 0.6, hence Ar(n) ~ 0.6 In(n). Indeed, one can prove 
rigorously that Ar(n) & 2 Inn. 


Solution of Exercise 3.6 


For the function fı, the results with n = 30 and n = 40 are shown in Fig. 3.12: 
the method converges slowly. For the function fo, the results with n = 10 and 
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Fig. 3.12. Interpolation at the Chebyshev points: (a) n = 30 and (b) n = 40. 


n = 30 are displayed in Fig. 3.13. It seems that the method converges, and 
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in fact it does. One can prove that the interpolation at the Chebyshev points 
converges for functions of class Ct. This is the case for fə, but not for fı. See 
the script in APP_Interpolation.m. 
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Fig. 3.13. Interpolation at the Chebyshev points: (a) n = 10 and (b) n = 30. 


Solution of Exercise 3.7 


See the script in APP_Runge.m. The results for n = 10, n = 20, and n = 30 
are shown in Fig. 3.14: the method diverges at the boundaries. Note that in 
this case the function to be interpolated is very smooth, but its Lebesgue 
constant “explodes” (see Remark 3.1). 

Runge phenomenon, n = 10 


Runge phenomenon, n = 20 Runge phenomenon, n = 30 
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Fig. 3.14. Runge phenomenon. From left to right: interpolation at 11, 21, and 31 
equidistant points. 


Solution of Exercise 3.8 


1. Computation of the Hermite interpolant using divided differences. See the 
function APP_ddHermite. The input data of this function is an array Tab 
whose first column contains the points x;, and for each i: 
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Tab(i,1) contains the points x;. 
Tab(i,2) contains an integer a;: the function and its derivatives up 
to a; are interpolated. 

e Tab(i,3:Tab(i,2)+3) contains the values of the function at point x; 
and its derivatives up to order a;. 

The call [xx,dd]=APP_ddHermite(Tab) returns two vectors: 

e the first vector contains the points x; taking into account their “mul- 
tiplicity”: if the function and its a; derivatives have to be interpolated 
at the point x;, this point is copied a; + 1 times in xx. 

e the vector dd contains the divided differences. 

With the help of these two vectors, we can implement the Horner algo- 

rithm (see page 53) to evaluate TA f(x). 

2. This is done vectorwise as follows: 


f=inline(’cos(3*pi*x) .*exp(-x)’); 

coll=[0 1/4 3/4 1]’; 

T=[coll zeros(size(coll)) £(coll)]; 

[xx ,dd]=ddHermite(T) ; 

plot the function on a fine grid 

x=linspace(0,1,100) ;n=length(dd); 

y=dd (n) *ones (size (x) ) ; 

for k=n-1:-1:1 
y=dd(k)+y.*x(x-xx (k) ); 

end; 

plot(x,y,x,f(x),’r?);hold on plot coll. ¢ Cool.) 57) 


See the script in APP_ApproxScript8.m. We see in Fig. 3.15 (a) that the 
curves intersect at only four points; they do not match at all, except at 
these points. 

3. Imposing the matching of the derivatives forces the polynomial to fit the 
function more closely (see Fig. 3.15(b)). However, there is still a region of 
the interval [0,1] where the two curves are not close to each other. 

4. By adding another interpolation point in this region, the approximation 
is much sharper, as shown in Fig. 3.15(b). 


Solution of Exercise 3.9 


The implementation is done in the script in APP_scriptHermite.m. For several 
values m = 5(m’ — 1) with m’ € {1,2,3,4,5}, we compute the corresponding 
polynomial pm with the help of the function APP_ddHermite. The column m’ 
of the matrix Y contains the values of pm on a uniform fine grid of 100 points 
in [0,1]. Figure 3.16 is generated by the script. 
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(a) (b) (c) 
Fig. 3.15. (a) The Lagrange interpolation polynomial p3 at the points 0, +, 7 and 1, 
(b) the Hermite interpolation polynomial p7 at the same points and (c) the Hermite 


interpolation polynomial pg at the points O, Ł, L, 7 and 1. The interpolation points 
are marked by circles. 


Hermite Interpolation 
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Fig. 3.16. Hermite polynomial interpolants. 


Solution of Exercise 3.10 


The equations form a linear system of n + 1 equations and n + 1 unknowns 
(the coefficients of the polynomial). This system has a unique solution if and 
only if the unique solution of the homogeneous problem (i.e., f = 0) is the 
null polynomial. This is the case since: 


The null polynomial is a trivial solution. 

If p € P, is a solution of the problem and if p(xo) = 0, we deduce that 
p(xi) = 0 for all i = 1,...,n. Hence p is the null polynomial (a polynomial 
of P, cannot have more than n distinct zeros). If p(xo) Æ 0 then p has 
alternating signs between two successive x;; hence it vanishes at n+ 1 
distinct points and is again the null polynomial. 


Writing p(t) = > =0 pjt’, the equations become 
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p(ti) = (—1)"p(xo) = f(x) En (—1)' f(xo)}, i=1,...,n+1. 


The vector a = (po,...,pn)! solves the system Aa = b with 


Age Al) CS DC eens Pr) 


The implementation is done in the function APP_equiosc. The input data of 
this function is a vector containing the values x;. The function computes the 
matrix À and the vector b defined above and returns the coefficients p; of the 
polynomial. 


Solution of Exercise 3.11 


The function APP_Remez computes for a given integer n the polynomial of best 
uniform approximation of a function f. Three cases are considered for the 
initialization of the algorithm: the Chebyshev points, equidistant points, and 
randomly chosen points. The parameter tol relaxes the equality constraint 
in step k of the algorithm (that is, a test of the form a = b is replaced by 
the test |a — b| < Tol). For n = 5, n = 10, and n = 15, the best uniform 
approximations on [0,1] of the function x +> sin(27 cos(rx)) are displayed in 
Fig. 3.17. 
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Fig. 3.17. Best uniform approximation in P, of f(x) = sin(2r cos(rx)). From left 
to right: n = 5, n = 10, and n = 15. The function f is plotted with solid lines. 


For n = 15, the algorithm initialized with the Chebyshev points converges in 
21 iterations. Initialized with equidistant points, it converges in 28 iterations. 
Initialized with random points, it does not converge (in general) after 100 
iterations. 

The good result obtained with the Chebyshev points can be explained: 
since the function f has a Chebyshev expansion, analogous to the Legendre 
series in (3.19), 


J= is 
k=0 
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we can use the approximation 
n 
i= X Île ~ fn+iTn+i 
k=0 


by neglecting the remainder of the expansion. We remark that Tanı equioscil- 
lates over [—1,1] at the n + 2 points t; = cos(i-fz), (¢ = 0,...,n + 1) since 
Tn+1(ti) = cos(ir) = (—1)'. Hence D 5, fkTh is close to the best uniform 
approximation of f in P, (see Theorem 3.3). This is the reason why the 
Chebyshev points are good candidates for the initialization of the Remez al- 
gorithm. 


Solution of Exercise 3.12 


The script in APP_ls.m computes the least squares approximation on {0, 1] 
of a given function. The instruction p=polyfit(x,y,n) returns an array p 
containing the coefficients of the polynomial with degree less than or equal to 
n that interpolates the values y(i) at the points x(i). In order to evaluate 
this polynomial on a grid of points, we use the MATLAB function polyval. 
Running the script in APP_ls.m returns the value n = 10. The polynomial is 
displayed in Fig. 3.18 with the points (x;,y;) marked by the symbol +. 
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Fig. 3.18. Least squares approximation in P19: polynomial approximation (solid 
line) and the points (a;, f(x:)) (+). 


Solution of Exercise 3.13 





We give only the results obtained with the piecewise constant spline, which 
can be found in APP_spline0.m: 
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h script spline0.m 
n0=10;E=[ J;N=C J; 
for i=1:10, 
n=i*n0;E=[E;APP_errorS0(n)] ;N=[N;n]; 
end; 
loglog(N,E,’-+’);xlabel(’log n’);ylabel(’log Error’); 
fprintf(’slope of the straight line = %g ’,... 
log(E(end)/(E(1)))/1og(N(end)/N(1))) ; 
function y=APP _errorSO(n) 
x=(O:n)’?/n;h=1/n;fx=f (x); 
Evaluation of $p_i$ on each interval $[x_i,x_it+1]$ 
y=[ ]; 
for i=i:n 
Ti=linspace(x(i) ,x(i+1),20); 
fisf (Ti); 
Si=f(.5*+x(x(i)+x(i+1))); 
y=ly norm(Si-fi,’inf’)]; 
end 
y=max (y) ; 
function y=f(x) 
y=sin(4*pixx) ; 


Running this script produces Fig. 3.19. The slope of the straight line is 
about —0.971325, which is a good approximation of the exact value —1 given 
by (3.29). 
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Fig. 3.19. Piecewise constant spline: curve Inn +> In|f — Sol. 
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Solution of Exercise 3.14 


The script in APP_splinel.m produces Fig. 3.20. The slope of the straight line 
is about —1.965, which is a good approximation of the exact value —2 given 
by (3.30). 
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Fig. 3.20. Piecewise affine spline: curve Inn +> In || f — Si. 


Solution of Exercise 3.15 


See the script APP_spline3.m. The results obtained with n = 5 and n = 10 
are displayed in Fig. 3.21. 

















0 0.5 1 
(b) 
Fig. 3.21. Cubic splines: (a) n = 5 and (b) n = 10. 
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A 


Solving an Advection—Diffusion Equation by a 
Finite Element Method 


Project Summary 


Level of difficulty: 1 


Keywords: Convection-diffusion equation, finite element method, 
stabilization of a numerical scheme. 
Application fields: Convection and diffusion phenomena. 


In this project we seek a numerical approximation of the solution u : [0,1] — 
R of the following problem: 


—Eu”(x) + àu’ (x) = f(x), = # € 10,1, 
u(0) = 0, (4.1) 
u(l) = 0. 


The function f and the real numbers € > 0 and À are given in such a way 
that there exists a unique continuous solution of this problem. Our aim is to 
approximate the solution with a continuous piecewise polynomial function. 
The differential equation in the problem (4.1) is an advection-diffusion 
equation. It models several phenomena, as, for example, the concentration 
of some chemical species transported in a fluid with speed À; the parameter 
e is the diffusivity of the chemical species. The ratio 0 = A/e measures the 
importance of the advection compared to the diffusion. For large values of this 
ratio, the numerical solution of the problem (4.1) is delicate. The production 
and the vanishing of the chemical species are modeled by the function f, which 
in the general case depends on the unknown u. In this problem, we assume 
that f depends only on the position x and we consider À and € as constants. 


4.1 Variational Formulation of the Problem 


A solution u of the boundary value problem (4.1) is also a solution of the 
following problem: 
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Find u € H4(0,1) such that 


j 4.2 
for all v € V': a(u,v) = | f(x)v(x)dx. oe) 
0 
| — R such that the 


Here V denotes Hj(0,1), the space of functions v : [0,1 
= v(1) = 0. The bilinear 


integrals i. ul? and T lv’|? are bounded and v(0) 
form a is defined on V x V by 


ali) = ef ul(vol(w)de +A | u (x)u(x)dzx. (4.3) 





Conversely, every regular solution of (4.2) is also a solution of (4.1). The prob- 
lem (4.2) is a “variational formulation” of (4.1). The finite element method is 
based on the computation of the solution of the variational problem (4.2) 
rather than a direct discretization of equation (4.1) by a finite difference 
method (see Chaps. 1 and 2). 

For a strictly positive integer n, we divide the interval [0,1] into n + 1 
subintervals [;. For a positive integer £, we denote by P,(1;) the set of algebraic 
polynomials of degree less than or equal to £ on J; and V/ the set of the 
continuous functions defined on [0,1] whose restriction to each interval I; 
belongs to P;(I;). Figure 4.1 displays two examples of functions of V}. 


Un E€ VF 
Dn. EVE 


Lo TN+1 To TN+1 
(a) (b) 


Fig. 4.1. Examples of functions of (a) V? and (b) VË. 


The finite element method consists in searching for an approximation up € 
V? of the function u, a solution of the following problem (compare to (4.2)): 


Find up, € V? such that 


: (4.4) 
for all vp € V? : alun, Un) = | f(x)vr(x)dx. 
0 
The integrals in the left-hand side of (4.4) are easily computed since they 
involve products of polynomials. More difficult is the computation of the inte- 
gral ii f(x)upn(a)dx, whose explicit calculation is rarely possible. In this case 
some quadrature rules are necessary. We will make use of two rules: 


e the trapezoidal quadrature rule 
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p Q 
f sean = (8 - A. 


This method is of order 1, that is, it is exact for all g € P:([a, 8l). 
e the Simpson quadrature rule 


T g(x)dx x — (ata) + 1g( St") en 0(9)) . (4.5) 


This method is of order 3, i.e., it is exact for all g € Ps([{a, 6l). 


Using one of these basic quadrature formulas on each subinterval J;, we get a 
quadrature formula on the whole interval |0, 1]. 

In this project, we compare two finite element methods (FEM) to solve 
the advection—diffusion problem. The first method is called P1 since it uses 
functions in V’; the second method is a P2 method since the approximation 
space for this method is V2. 

To validate the computations, we have to compare the computed solution 
to the exact one. Generally, this can be done only in some special simple cases. 
The aim of the first exercise is to compute the exact solution of (4.1) in the 
case of constant source terms. 


Exercise 4.1. The exact solution in the case of constant nonzero f. 


1. Derive explicit formulas for the solution u of the problem (4.1). 

2. Prove the existence of xg € [0,1] depending only on the ratio 0 = A/e 
such that the function Fu is strictly increasing (respectively decreasing) 
over |0, xo| (respectively |xg, 1[). Calculate limyg)_,4.. x9. 

3. For À > 0 fixed, we are interested in the behavior of the solution u for 
€ going to OT (and thus 6 — +00). Calculate u(xe) and lim._,o+u(xe). 
Show that 


coe aon) a ge Ae) 





Try to explain the meaning of the sentence, for small values of €, the 
solution of the differential problem (4.1) contains a thin boundary layer 
in a neighborhood of the point x = 1’. 

4. Write a program that computes the values of the solution u on a given 
set of points (an array). Plot u for f = 1, À € {-1,1}, and € € 


IT; L, IOF: ‘mes Comment the results. 


A solution of this exercise is proposed in Sect. 4.6 at page 97. 


4.2 A P1 Finite Element Method 





For n € N*, we define the points 


ce =kh, k=0,...,n#1, 
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and the intervals 


IS ea KS Osses 


with grid size h = 1/(n + 1). We also define k “hat functions” oC, (hs 
1,...,n,) (see Fig. 4.2), such that 


alen and g(a) = 6x, Vi=l..n 
(1) : 


with ô; the Kronecker symbol. Note that the support of the function Ph p ÍS 
the union of two intervals J,_1 and Ip. 


(1) 
Ph,k 


(1) a a) (1) (1) 
Lo Th Th n NH 


Fig. 4.2. À hat function pr ) of We 


In the finite element methods, the points gl 


intervals J; cells. We seek an approximation us) € VP of the function u, a 


solution of the problem (4.4) with £ = 1: 


are called nodes and the 


Find ur € V! such that 
4.6 
for all vp € Vi: a (uv) af f(x)up (x eo 


Exercise 4.2. 


1. Prove that the functions (Pr, ) m n_, form a basis of V}. 
2. Deduce that problem (4.6) is equivalent to the following problem: 


Find uf” € V? such that 
4.7 
for all. k= Lern alu y) )=[ f(a ph (2) (4.7) 


3. By expanding ul ) in the basis (of LE 
=1 


1 1 
= D em Pam: 
m=l1 
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show that a, = up Gs p and that the vector a) = (ui? (xD), +. 


is a solution of the linear system 
AM gD = pO), 
where A is the real matrix of size n x n defined by 
Aen sale) Terre 


h,m? 


and pin) is the vector of R” 


oP), -f f(x ph (2) jde, “lL kh 7: 


T 
1 1 


(4.8) 


Show that AW) = ai + O where BY is a tridiagonal symmetric 


matrix and Co a tridiagonal antisymmetric matrix. 


4. Prove that the symmetric matrix BY is positive definite, i.e., for all 


x € R”, (BY x, 2) > 0, with equality if and only if x is the null vector 
((.,.) denotes the Euler inner product). This property is very useful in the 
numerical analysis of linear problems. It implies, in particular, that the 


matrix is invertible. 


5. Show that (AU x, x) = (BY x, x). Conclude that AY is invertible. 


A solution of this exercise is proposed in Sect. 4.6 at page 99. 


Consequently, the system (4.7) has a unique solution that will be computed 


by solving the linear system (4.8). 
Exercise 4.3. Computation of the P1 solution by solving (4.8). 


1. Derive the following explicit formulas for By” and her 


2 —1 0 NT 0 0 1 0 
—1 2 —1 0 —1 0 1 0 
1 1 
1 0 1 0 
Bi ce 
: 0 —1 2 =1 à 0 —1 
0 PRET 0 —1 2 0 POSER 0 


O 1 
—1 0 


2. Write a program that computes the matrix Av) (with input data n, €, 


and À). 


3. Write a program that computes the right-hand side pft) (input data: n, 
and f). Use the trapezoidal rule to compute the components of the vector 


aD, 
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4. Fore = 0.1, A = 1, f = 1, and n € {10,20}, compute the solution üy of 
(4.8) and compare it to the exact solution. 
5. Error analysis. Fix the parameters £ = 0.1, À = 1 and f = 1. For n going 


from 10 to 100 in steps of 10, draw the curves logn + log le? loo, with 
e\) € R” the error vector defined by (eP) — ait” (k) — u(x”). Deduce 


a decreasing law for lose loo of the form 
le® ||, & constant /n* for n — +00, 


with sı > 0 to be determined. 
A solution of this exercise is proposed in Sect. 4.6 at page 100. 


The P1 finite element method seems to be well suited to solve the advection— 
diffusion problem. Unfortunately, things are not so simple. For À = 1, f = 
1, € = 0.01, and n = 10 we obtain the results displayed in Fig. 4.3. The 
oscillatory behavior of us) shows that it is clearly not a good approximation of 
u, especially in the boundary layer. We investigate in the next section whether 
a high-order finite element method is able to suppress these oscillations. 
The next exercise answers the following question: for a fixed €, what is the 
minimum number of subintervals required to resolve the boundary layer? 





Exercise 4.4. Fix À = 1 and f = 1. For various “small” values of e (for ex- 
ample in the range (0.005, 0.02]) determine the integer n = n(e) from which 
the numerical solution seems to be a reasonable approximation of the exact 
solution in the boundary layer (i.e., the numerical approximation is not oscil- 
lating and is close to the exact solution in the boundary layer). 

Hint: Use the following strategy: for fixed €, run the program for n = 10, 20, 30, 
etc. For each value of n, plot the exact solution and the numerical approxima- 
tion. From these graphs, decide whether the approximation is good. For each 
value of n, compute P = LR, the Peclet number of the grid. What conclusion 
can be drawn from this? 

A solution of this exercise is proposed in Sect. 4.6 at page 102. 





4.3 A P2 Finite Element Method 


For n € N*, we set h = 1/(n + 1) and define the points g” = iho = 


0,...,2(n +1)). Notice that r = gl and the intervals Ip = ean 


(k = 0,...,n) are those used in the previous section. In other words, we keep 
the same number of intervals and add to each interval a new node, namely 
the center of the element. To get a better approximation, we shall associate to 
each node of the mesh a piecewise quadratic function rather than a piecewise 
affine one. 
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1.6 
1.4} 


1.2; 














0 02 0.4 0.6 0.8 1 


Fig. 4.3. Approximation of the solution of the advection—diffusion problem by a 
P1 finite element method, € = 0.01, À = 1, and n = 10. 


We seek an approximation up € VË of the function u, a solution of problem 
(4.4) with £ = 2: 


Find u? Le VÈ such that 
4.9 
for all vp € Vy: aluf” vn) = [5 f(z)vr(x ae 


As in the previous section, we begin by building a simple basis of V?. On each 
interval J,, we define three quadratic Lagrange polynomials associated with 


the points oe), ee and os 


— 2 2 

T OEE — ah, (a — oppo) /h?, 
O 2 2 

M (x) = —4(a — 2 )(x — 0, 9)/h?, 

+ 2 2 
He) = e-29)(e—- nl 


To each node r? ) 


(see also Fig. 4.4) 


of the mesh we associate the function pẹ , € VÈ defined by 


(—) 
x) for x€ lk, 
(2) (x ) = _ ico for x € 1 à (2) (x) B Pi ( ) : k 
Ke 0 otherwise, PR2k UT) = T Or x£ € Ik—1, 
0 otherwise. 


Notice that the support of the function pl?) ) is either the interval I k OT 
the union of the two intervals 74_ and 1}, eee to the parity of k. Since 
the functions Cris form a basis of V}, problem (4.9) is equivalent to the 


problem 


Find ur? c VÈ such that 


4.10 
FOP aU el et DI a(uy’, py) = ae f(a en (2) (4.10) 
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(2) (2) (2) Pe (2) 
Tək—2 Top Tək Tə2k+1 Lop+o 


Fig. 4.4. Generic functions forming a basis of V2. 


(2) ; 


Expanding u,’ in the basis Pik 


2n+1 


2 
ur?) = >» XmPh,m) 
m=1 


we prove, as in the P1 case, that am = ul” (27) and the vector a? ) = 


(a1,...  Aon+1)? solves a linear system 
AMG) — po, (4.11) 


with Av) the (2n + 1) x (2n + 1) matrix defined by 


A em = UP Pak) 1<mMmk<Im+1 


km — 


and bp the vector of R2"*+ defined by 


(BY), = [re joi (ejder 1<k< Intl. 


Exercise 4.5. Put Av) = eB) EAC A, Is the matrix BY) symmetric? tridi- 
agonal? Is the matrix ae antisymmetric? tridiagonal? Prove the invertibility 


of the matrix A>. 
A solution of this exercise is proposed in Sect. 4.6 at page 103. 


System (4.10) has thus a unique solution, which we compute by solving the 
linear system (4.11). 


Exercise 4.6. Computation of the P2 solution. 


1. Prove that BY) and Cm have the following pattern (given here for n = 3): 


16 -8 0 0 0 0 0 

-8 14 -8 1 0 0 0 
Sy a. i: Se 16 Seo oe 10 
p =z |o 1 -8 14 -8 1 «0 |, 

3h 

0 0 0 -8 16 -8 0 

0 0 0 1 -8 14 -8 

0 0 0 0 O -8 16 
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0 4 0 0 0 0 0 
—A 0 4 EE 0 0 0 

0 ar 0 4 0 0 0 

oY) =} O 1 -4 0 4 —1 0 
6ļ o 0o 0 -4 O 4 0 

0 0 0 1 —4 0 4 

0o 0 0 0 0 —4 0 


2. Write a program to compute the matrix Av) (input data: n, €, and À). 
3. Write a program to compute the right-hand side pl?) (input data: n and 
f). Use Simpson’s rule to compute the components of the vector pe?) 

4. Error analysis. Fix € = 0.1, À = 1, and f = 1. For n going from 10 to 
100 in steps of 10, draw the curves log n + log le? loo; with eP € R2r+1 
the error vector defined by (eP y = a? (k) — u(x’), Deduce that the 


decreasing law for le? loo is of the form 
e2) = constant /n®?, for n — +00 


with s2 > 0 to be computed. 


A solution of this exercise is proposed in Sect. 4.6 at page 103. 


4.4 A Stabilization Method 


In this section we propose a method for removing the oscillations that we have 
observed in Fig. 4.3. Running the P2 program with the same parameters 
(A = 1, f = 1, e = 0.01, and n = 10) we obtain Fig. 4.5. At first glance, 
the oscillations persist. However, if one is interested only in the values of 
the solution at the endpoints of the intervals (the points xo), one gets a 
nonoscillatory approximation of the exact solution. We propose to check this 
assertion and explain it. First of all, we define a way to compute the values 
of us?) at the endpoints of the intervals, without computing the values at 
midpoints. 


4.4.1 Computation of the Solution at the Endpoints of the 
Intervals 





Let AY denote the matrix obtained after permutation of the rows of the 


matrix Al) in order to put in the first places the rows with even indices (the 
same operation is performed for the columns). In the same way, we define the 








vector ie from the vector pe?) We also define the vector ae? from the vector 


of unknowns a? The system (4.11) is equivalent to the system 


PS PS 


(2)~(2) _ 7(2) 
Ap Up =b}, 
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(b) 





Fig. 4.5. Approximation of the solution of the advection—diffusion problem for 
e€ = 0.01, À = 1, and n = 10. (a) P2 finite element solution, (b) the same solution 
displayed only at the endpoints of the intervals. 


which one could have obtained directly from the variational formulation by 
changing the numbering of the unknowns. Finally, we split AP), pe), and a? 


as 

10) (AB FAC) em Ec ~(2) _ | U 

=p #-[ Wla 
where AE RP BER L CER T DEROU oe R. 
d € Rt! v € R”, w € R"*!, and 


2 2 2 2 
Aij = = eh 7 pr 2) Bij = aloh 7 yk 3 su 
Ci,j = = al, 2j—1? Ph, zi) Dij = al), 23-1? Ph, ou) 


2 2 
ne (x) py a (x)dx, di zp (ok A itz vjdi, 


2 2 2 2 
vi a By wi = uP (a ch Ay 


It is easy to check that 


the matrix A is tridiagonal, 

the matrix B is upper triangular and bidiagonal, 
the matrix C is lower triangular and bidiagonal, 
the matrix D is diagonal. 


The unknowns v and w are solutions of the linear system 


ea (4.12) 


Cv + Dw = d. 


The diagonal components of the matrix D are 


(2) 


9 9 Un ok 9 / 
Den = (php) € fg [PRs WP az > 0, 
Uh 2k—2 
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and consequently the matrix is invertible. From the second equation of the 
system (4.12), we get w = D~!(d— Cv). Plugging this expression for w into 
the first equation, we obtain 


(A-— BD "C)v = c- BD 'd. (4.13) 


This equation allows us to compute the vector v (i.e., the components of the 
P2 solution at the endpoints of the subintervals). Let us note that the ma- 


trix A — BD~!C is tridiagonal, whereas the matrix Av) is pentadiagonal. 
This method, which consists in isolating, in a linear system, some of the un- 
knowns that solve another simpler system, is called condensation. It is simply 
a Gaussian elimination of the unknowns associated with the centers of the 
subintervals. 


Exercise 4.7. 


1. Show that the matrix A — BD7~!C is invertible. 

2. Compute the matrices A, B, C, and D by extraction of rows and columns 
of the matrix Av) computed in Exercise 4.6. 

3. Fix à = 1, f = 1, and e£ = 0.01. For n = 10 and n = 20, solve the problem 
(4.13). In the same figure: 
e plot as a solid line the exact solution computed for 100 points in [0, 1], 
e plot using some symbols the numerical solution, i.e., the components 

of the vector v. 

4. For n = 20, compare to the P1 method. What is the minimum value no 
starting from which the P1 method gives the same quality of approxima- 
tion (a visual appreciation is enough)? 


A solution of this exercise is proposed in Sect. 4.6 at page 105. 

In conclusion, the P2 method, used only for the endpoints of the subintervals, 
provides good results. In the next section, we justify these observations. 
4.4.2 Analysis of the Stabilized Method 


Let us consider only the values i a i.e., the values of us?) on the grid 


of the P1 method. We prove that these values can be computed by a slightly 
modified P1 scheme. More precisely, setting 


A sAn BDC 
the following result holds. 


Proposition 4.1. The matrix ANS) equals the matrix AT in which £ is re- 
placed by €’ = e + A\*h?/(12e). 
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One could understand this result as an addition of a viscosity term to the origi- 
nal scheme that makes the solution smoother (less oscillatory). The remainder 
of the section is devoted to proving Proposition 4.1. In order to compute the 
matrix AC?) we denote by X = Tridiag(a,b,c;n,m) the tridiagonal n x m 
matrix X defined by 


Xj-14 = 4, Xii =b, Xi i441 =c, if these indices are defined. 


Notice that Tridiag(a, b,c; n,m) is not necessarily a square matrix. The reader 
who enjoys long calculations will prove that 


x 
Az = Tridiag(1, 14, 1; n,n) + © Tridiag(1,0,—1;n,n), 


8 2A 
De Tridiag(0,1,1;n,n + 1) — = Tridiag(0,1,—-1;n,n +1), 


3h 
SE... Aa 
C= E Tridiag(1,1,0;n + 1,n) + r Tridiag(—1,1,0;n + 1,n), 
16 
D = — Tridiag(0,1,0;n + 1,n +1), 


= 3h 


G e NO os 
BC = T Tridiag(1,2,1;n,n) + a Tridiag(2, 0, —2; n,n), 
4 
ae Tridiag(—1, 2, —1;n,n), 
16€ 


where o = 3. The reader may also calculate 


1 
ANS) = À — BC = Tridiag(a, 8, y; n,n), 


where 
gee Re À; à 
© h 2 1% h D 
Ee Ah 2, 
= Fa GE he? 
an a i h 2 
Finally 


/ 
À 
AUS) = — Tridiag(—1, 2, —1; n,n) + 5 Tridiag(—1,0,1;n,n), 
and the proposition follows since (see Exercise 4.3) 


A 
AD) = : Tridiag(—1, 2, —1;n,n) + 5 Tridiag(—1,0, 1; n, n). 
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4.5 The Case of a Variable Source Term 


We consider in this last section the advection-diffusion equation (4.1) with a 
nonconstant source term f in order to understand the effect of this term on 
the existence of a boundary layer. 


Exercise 4.8. 


1. Fix € = 0,01, À = 1, and f(x) = cos(arx), a € R. For a = 0, we already 
know the existence of a boundary layer near the endpoint x = 1. We 
assume in this exercise that a > 0. For several values of a = 1,2,3,..., 
compute (by any of the previous schemes) the solution of the equation 


(4.1). (Hint: take n large enough to avoid oscillations in the numerical 


solution.) Comment on the results. Same questions for a = 2 


5: 
2. Calculation of the exact solution. Define for x € [0,1], 


Fola) = f e| f suye ay) az 


(a) Show that for all real numbers a and £, the function a + Be?” — +Fy 
is a solution of the differential equation (4.1). 

(b) Determine a and 8 such that u = a+ 8e?” — t Fo is a solution of prob- 
lem (4.1) i.e., it satisfies the differential equation and the boundary 
conditions. 

(c) For f(x) = cos(arx) with a € R*, prove that 


li = — si . 
lim, u(x) sin(arx) 


3. Explain the results obtained in question 1. 


A solution of this exercise is proposed in Sect. 4.6 at page 106. 
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Solution of Exercise 4.1 


Computation of the exact solution. 


1. For a constant function f, the solution of problem (4.1) is 


(x) = À el 


2. With 0 = à, we rewrite u as 
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Hence 
À , pet? 1, fe 
—u(r)=0<—1- = 0 4 be” = e? — 1 4 x = -ln | 
rea eo —] 0” a 


From this we deduce that xg = In ool =1+ 5 In a € |0, 1| and 








u(x) > 0 => e° — 1 -— 0e’? > 0 => pe?” < Ge? 4 x < To. 


The limits are limg.4..% = 1 and limg_,_~ rp = 0. 


. It is easy to check that 


des E E 


The “boundary layer” is due to the strong variation of the solution (from 
f/X to 0) over a small interval [x9, 1] whose length 1 — xg = 4 ln 1e 
goes to 0 as 0 goes to +00. 

The following function FEM_ConvecDiffSolExa computes the exact solu- 


tion of the problem: 





function y=FEM ConvecDiffSolExa(e,lambda,fc,x) 

À solution of the convection--diffusion problem 

h case €, A, and f constant 
y=fc/lambda*(x-(1-exp(lambdaxx/e))./(1-exp(lambda/e) )) ; 


We display in Fig. 4.6 the solutions for f = 1, e € {1,1/2,107', 1077}, 
and (a) À = —1 and (b) À = 1. For small values of €, we observe a 
small interval near x = 1 or x = 0 (according to the sign of À) in which 
the solution suddenly changes from a value close to 1 to zero; this is the 
boundary layer. 





























Fig. 4.6. Solution of the convection-diffusion problem for (a) À = —1 and (b) A = 1. 
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Solution of Exercise 4.2 


1. 


The linear space V* is of dimension n since there is an obvious isomor- 
phism from this space onto R”: to each u = (u1,...,Un)* € R” cor- 
responds a function v € V? defined by v(x;) = u;. From the identities 
praa) = ĝj k, we deduce that the functions (pi ) yn 
pendent: 


5 cp} (x = j= 0,4 — b crpio (2) = nv) — 0 vi 


k=1 


_ are linearly inde- 


Since V? is of dimension n, the (pee ;— form a basis. Here are analytical 


(1 : 


expressions for p}, and its derivative 


e fora él, LUE ef (x) = ef) (x) =0, 
© for x EIk, yy, (a) = (x — £k- 1)/h and y9, (x) = 1/h, 


e for € Ir, PO) (x) = (£r41 — æ)/h and gy) (x) = —1/h. 
(1)\n 


. Since (93 ;)5-, form a basis of V! we can replace in (4.6) vp € V? by vp 


any element of the basis Cu J= 


(1) 


. Using again the identities 6} (xj) = dx, we get 


1 
up, ‘(a A (£j) = a; 


k=1 


Replacing in (4.7) up by its expansion in the basis et), we get 


Salhi pi Jars [ Taat Vj aay: 
k=1 
Let AW) be the n x n matrix and pr) the vector of R” defined by 


1 1 1 1 1 

AP ie aoh QE =f sean 

The vector tin = (a1,...,a,)! solves the linear system AG = pt), 
The matrix of this system can be split as AW) = BY”) + AC ae where 


1 1 1 1 1 1 
COTES à eo) p da, and COTE à oo) dx. 


Note the symmetry of the first matrix (BY). 4 = (BY), 5. Using an 
integration by parts and the fact that the basis functions are null for x = 0 


and x = 1 yields (C;, a p= = (C, (D) k j- The matrices are tridiagonal since 


z k 


the supports of any two functions p, ; and w} ; are disjoint for |j — k| > 1. 
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4. Consider x € R” with components (x;,)7_,. We get 


(BY x, x) = S (Ba) ) T= SS Bo) rian 


k=1 k=1 j=1 


“bade [ eo (x poy j (z )dx 


2 
-f ne J dx > 0. 


Moreover, (BY x, x) = 0 implies that 


Dh te =) 


for all x € [0,1]; thus the x, are all zero. Consequently, the symmetric 
matrix BY is positive definite. 


5. From the antisymmetry of the matrix Gee we get 


(ohms) = (= Ch) = — z Ga = — Ten. , 


that is, (OY x, x) = 0, and the result follows. 


Solution of Exercise 4.3 


Numerical computation of the P1 solution. 


1. Computation of Ap. Recall that the supports of two sufficiently distant 


basis functions ph and oh) are disjoint. More precisely, defining bg ; = 


I dy CD ee 
Jo Pak Phy and Cb = fo PR Phar we get 
e ok SL bry = Cc. 7 =; 
e fork — j, 


i 1 iy? 2 
w= f ORPS RPS ORY, 
Ik—1 Ik 


1 
af o, phe = / oi onet | oih one = 0, 
0 Di L, 


e fork=j+1, 
(1) D 1 
bj+1j = bj j+ -f Piati ‘gh E 


1 ’ 1 
= = (1) a 
Cj+1,j = Cj j+ = à Phj+1 Phi ~ ~ 5 
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2. See the function FEM_ConvecDiffAP1. 
3. Using the trapezoidal rule, we get 


(1) 
Viet 
O )x -f M = L. “foQde + J a fonda 
xl Th 


h 


2 re iu olde) + af are E+ f(x bt) Pree 25] 


= hf (xt? ). 


See the function FEM_ConvecDiffbP1. 
4. The following MATLAB script returns the results displayed in Fig. 4.7: 


à 


eps=0.1;lambda=1; hphysical parameters 

f = inline(’ones(size(x))’);/right-hand side of the equation 
n=10; 

A=FEM ConvecDiffAP1(eps,lambda,n); matrix of the linear system 


b=FEM_ConvecDiffbPi(n,f) ; hright-hand side of the linear system 
u=A\b; AFEM solution 
u=[0;u;0] ; hadd to u the boundary values 


x=(0:n+1)/(n+1); ‘mesh 
uexa=FEM ConvecDiffSolExa(eps,lambda,1,x); exact solution computed on x 
plot (x,uexa,x,u,’+-r’) 





























Fig. 4.7. Approximation of the convection-diffusion problem (P1 FEM), € = 0.1, 
A= 1, (a) n = 10 and (b) n = 20: 


For the tested values of A and £, we get a good approximation. This script 
is written in the file FEM_ConvecDiffscript1.m. 
5. Analysis of the error. Figure 4.8 is generated by the following script 


error=[ ];N=[ |; 
for n=10:10:100 
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A=FEM_ConvecDiffAP1(eps, lambda,n) ; 
b=FEM_ConvecDiffbPi(n,f); 

u=A\b; 

u={[0;u;0]; 

x=(0:n+1)’/(n+1); 

uexa=FEM ConvecDiffSolExa(eps,lambda,1,x); 
N=[N;n];error=[error; norm(uexa-u,’inf’)]; 
end 
plot (log(N) ,log(error) ,’+-’); 


We see in Fig. 4.8 a straight line of slope approximately —2. We deduce 
that sı = 2. If we refine the mesh twice, by changing h into h/2, the error 
is divided by 2°! = 4. 

See the script FEM_ConvecDiffscript2.m. 








—6.5! 


—/.5f 


—8.5! 











2 2.5 3 3.5 4 4.5 5 


Fig. 4.8. Approximation of the convection—diffusion problem (P1 FEM). Logarithm 
of the error versus the logarithm of n (e = 0.1, A = 1). 


Solution of Exercise 4.4 


See the script FEM_ConvecDiffscript3.m. 


eps=0.01; lambda=1 ; 

f = inline(’ones(size(x))’); 

yes=1; 

while yes 
n=input(’enter n : ’); 
A=FEM_ConvecDiffAP1 (eps, lambda,n) ; 
b=FEM_ConvecDiffbP1(n,f); 
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u=A\b; 
u=[0;u;0]; 
x=(O:nt+1)/(nt1); 
uexa=FEM_ConvecDiffSolExa(eps,lambda,1,x) ; 
plot(x,uexa,x,u,’+-r’) 
Peclet=abs (lambda) /2/eps/(n+1) 
yes=input(’more ? yes=1, no=0 ?) 

end 


Note that the approximation is good for a Peclet number Pe < 1. 


Solution of Exercise 4.5 


In the P1 method, the matrices BY and Ce are tridiagonal. In the P2 
method the supports of the basis functions are larger and the matrices BY) 
and Co are pentadiagonal. As in the P1 case, we can prove that the matrix 
B (2) is symmetric, positive definite; the matrix Cw is antisymmetric; and the 


matrix A? is invertible. 


Solution of Exercise 4.6 


The derivatives of the basis functions are 


(2) (x = —8(x — $R, ,)/h? for x € Ik, 
0 


Ph 2k+1 7 otherwise, 
and 
54 A(x — a for x € Ik, 
Prok (2) = 4 A(x — ey a for x € li, 
0 otherwise. 


1. (a) Computation of BY), Since the matrix is symmetric, only its upper 
triangular part is computed. 
e Rows with odd indices. _ 
(Bjr e (@)Pde = fray? (caf) .))Pde = 
Tok 
161 
3 A? 
° (D Van 1,28 +2 = —È;, 
e (B oeii = 0, Vm > 2k +3. 
e Rows with even indices. 
(Bee) 


Il cl 
>| 


° (By s Jakak -Š1, 
° (Bi Race = if 
© (Bt Ve = 0, Ym > 2k + 3. 
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Thus, the upper triangular part of the matrix BY? is 


16 —8 0 0 0 
14 —8 1 0 

16 —8 0 

14 —8 


H © © © 


0 
0 
0 
0 


(b) Computation of Co The matrix being antisymmetric, only its upper 
triangular part is computed. 
e Rows with odd indices. 
Chiot = 0, 
° C enna = À, 
© (CY opyuim = 0, Wm > 2k +3. 
e Rows with even indices. 
(Cr lon 2x = 0, 
(Caper =, 
Css = —ż, 
© (CP) m=0, Ym >2k+3. 
Thus, the upper triangular part of the matrix CY is 


© 
CIN 


0 4 0 0 0 0 0 
0 4 -1 O 0 0 
O 4 0 0 0 
0 4 -1 0 
1 
2. See the function FEM_ D | a 
Tək+1 9 Vakta 9 
3. (BY?) ) )2k+1 = -f fer A page = |. fo ride + > 2) foy de 
Tok Pop 


Using Simpson’s formula on each interval, we obtain 


(2) _ 2 (2) 
(o; Vas = rue (as) - 


In the same way, oP) ~ shf io) . see the functionFEM_ConvecDiffbP2. 
2k 


4. As for Exercise 4.3, we plot in Fig. 4.9 the logarithm of the error versus 
the logarithm of n. The curve is a straight line of slope approximately 
—3.3. The error decreases faster than in the P1 case. 

Remark. For a more relevant comparison of the methods P1 and P2, 
consider the following quantity: 
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1 
| ah) —wl(a)Pde SJ ple) -u (a) Pa. 
0 intervals I; Ik 





We refer the reader to the references at the end of this chapter. 














D 25 3 35 4 45 5 


Fig. 4.9. Approximation of the convection—diffusion problem (P2 FEM). Logarithm 
of the error versus the logarithm of n (e = 0.1, A = 1). 


Solution of Exercise 4.7 


1. Let x € R” be a nonnull vector such that (A—BD-tC)x = 0. The nonnull 
vector y = (xt, -(D~!Cx)")? e R?! is such that AP y = 0. However, 
the matrix A) is invertible. This leads to a contradiction. Hence the 
square matrix À — BD~!C is injective and consequently invertible. 


A=FEM_ConvecDiffAP2(eps, lambda,n) ; 
a=A(2:2:2*n+1,2:2:2*n+1) ;b=A(2:2:2*n+1,1:2:2*n+1); 
c=A(1:2:24n+1,2:2:2*n+1) ;d=A(1:2:2*n+1,1:2:2*n+1) ; 


3. The following script is written in the file FEM_ConvecDiffscript4.m: 


n=10;eps=0.01; lambda=1 ; 

f = inline(ones(size(x))’); 

sm=FEM ConvecDiffbP2(n,f) ; 
nsm=sm(2:2:2*n+1)-b*inv(d) *sm(1:2:2*n+1) ; 
u=(a-b*inv(d)*c)\nsm;  %computation of v 
x=linspace(0,1,100) ; 

uexa=FEM ConvecDiffSolExa(eps,lambda,1,x); 
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plot (x,uexa) ;hold on 
plot((1:n)/(n+1),u,?+?);hold off; 


For n = 10 and n = 20, see Fig. 4.10. 





























Fig. 4.10. Stabilized solution of the convection—diffusion problem: À = 1, € = 0.01, 
and (a) n = 10 and (b) n = 20. 


4. Figure 4.11 displays the stabilized solutions for n = 20 and the P1 solu- 
tions for n € {20, 40, 60, 80}. 


Solution of Exercise 4.8 


1. With the same set of parameters and a constant f, we have observed a 
boundary layer. This is no longer true. We present some results in Fig. 
4.12: the source term f transfers its oscillations to the solution u. Here 
is the script that produces Fig. 4.12. This script is written in the file 
FEM_ConvecDiffscriptd.m.: 


n=100; lambda=1 ; eps=0.01; 
x=(0:(n+1))°/(n+1); 
A=FEM_ConvecDiffAP1 (eps, lambda,n) ; 
X=[ ];Y=( ]; 
h=1/(n+1) ;tab=(1:n)’*h; 
for af=1:5 

b=h*cos (af*pix*tab) ; 

y=A\b;y=L0; y; O]; 

X=[X x];Y=LY yl; 


end; 
plot (X,Y); 
For a = 5, we observe a boundary layer; see Fig. 4.12. This observation 


is discussed in the next question. 


P1 solution(o) and stabilized P2 solution (+) 

















0 0.2 0.4 0.6 0.8 





1 
P1 solution(o) and stabilized P2 solution (+) 

1.4 

1.27 


0.8: 


0.6: 


0.4: 


0.2; 














% 0.2 0.4 0.6 0.8 


1 


4.6 Solutions and Programs 107 


P1 solution(o) and stabilized P2 solution (+) 














0 0.2 04 0.6 0.8 1 


P1 solution(o) and stabilized P2 solution (+) 





0.8; 


0.6; 





0.4; 


0.2; 











% 0.2 0.4 0.6 0.8 1 


Fig. 4.11. Stabilized solution for n = 20 and the P1 solutions with n = 20 (top 
left), n = 30 (top right), n = 40 (bottom left), n = 60 (bottom right). 





0.5 











a € {1,2,3,4,5}, (b) f(x) = cos(3ax 


2 


2. Computation of the exact solution. 





j: 





0.3 











(b) 


Fig. 4.12. Solution of the convection-diffusion equation: (a) f(x) = cos(arx), for 


(a) It is easy to check that a + Be°* — +F% is a solution of the differential 


equation (4.1). 


(b) The boundary conditions u(0) = 0 and u(1) = 0 allow the determina- 


tion of a and 6, 


_ 1 Fe) 
ele 
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and the solution is 
1 
e(1 — et) 


(c) f(x) = cos(arx). Define I = [> cos(amy)e~"¥dy and J = [> sin(ary)e” dy. 
Computing the real part of I + iJ, we get 


u(x) = (1 — ef?) Fy(1) — (1 — e?) Fy(z)). (4.14) 


z —0z a; — 02 
-0y pp, — 27E sin(arz) + 0 — 0e~’* cos(amz) 
[ fey = ee a e 


and deduce the identity 


8 
(0? + 17a?) Fo(x) = e°” — cos(anx) — EE, 
T 


The solution of the problem (4.1) is the sum of the two terms — + Fo () 


and a + Be”. 
e The first one can be split into three parts: 
1 0 sin(aTx) ide ba 
©", whose limit is — sin(arx) for € goin 
e (02 + r?a?) ar AT eens 
to 0, 


1 cos(arx) 


= (62 +242) whose limit is 0, 


1 eft 
dy = --=— 5. 
D E 02 + 17a? 
e The only term in a+ 8e?” that does not go to 0 is 
1 e? [cgn 
72 


E Hra l-e? 
Since the sum 1 + %2 goes to 0, we deduce that 


li = — si . 
lim, u(x) Maa sin(aTx) 


3. The function u is continuous and satisfies the boundary condition u(1) = 


0; hence 

lim lim u(x) = lim u(1) =0. 

e>O0+ x1 e—0+ 
In addition, lims-,1 lim. ,0+ u(x) = —sin(ar). Consequently, for an 
integer a, there is no boundary layer in the vicinity of x = 1 since 


lim, +1 liM no+ u(x) = lim,_,9+ lim,_,1 u(x) = 0. Conversely, for a nonin- 
teger value of a, there exists a boundary layer since lim,_,; lim,_,9+ u(x) Æ 
0. 
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Solving a Differential Equation by a Spectral 
Method 


Project Summary 


Level of difficulty: 2 


Keywords: Spectral method, polynomial approximation, Gauss 
quadrature, orthogonal polynomials, Legendre polyno- 
mials, variational formulation 


Application fields: Whenever high accuracy is required to compute 
smooth solutions of PDEs 


Introduction 


Spectral methods are approximation techniques for the computation of the 
solutions to ordinary and partial differential equations. They are based on 
a polynomial expansion of the solution. The precision of these methods is 
limited only by the regularity of the solution, in contrast to the finite difference 
method and the finite element methods. The approximation is based primarily 
on the variational formulation of the continuous problem. The test functions 
are polynomials and the integrals involved in the formulation are computed by 
suitable quadrature formulas. This project proposes to implement a spectral 
method to solve the following boundary value problem defined on the interval 
NQ = (—1,1): 


—u" + cu = f, 
u(—1) = 0, (5.1) 
a(L)=0; 


with f € L?(2) and c a positive real number. 
The first part of the project consists in pointing out some properties of the 
Legendre polynomials. These polynomials will be used to design a basis of the 
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approximation space. In the second part, we define the Legendre expansion of 
a function and compute the truncated Legendre expansion. To this end, we 
introduce a method to compute the integrals accurately, namely the Gauss 
quadrature formula. Finally, in the third part of the project, we implement 
the approximation of the differential equation (5.1) by a spectral method. 





5.1 Some Properties of the Legendre Polynomials 





Let P,, denote the set of all polynomials with degree less than or equal to a 
positive integer n and (Ln )o<n the family of Legendre polynomials. Note that 
these polynomials form an orthogonal basis on |—1,1| since for all integers n 
and m, 


1 
2 


The Legendre polynomials are solutions of the differential equations 
(1 — ^L (£) + n(n + 1)L,(x) = 0, n > 0, (5.3) 
and they satisfy the following three-term recurrence formula 
Lo (x) = 1; 
Li(x) = z, (5.4) 
(n+ 1)Ln+1(£) = (2n+1)xL,(x) -nlh_1(x), forn>1, 


from which we deduce the special values L,,(+1) = (+1)”. 





Exercise 5.1. 1. Write a function y=SPE_LegLinComb(x,c) that plots a lin- 
ear combination of the Legendre polynomials of the form 


ya) = > cb) (5.5) 
k=1 


The inputs of the program are: 
e an array (a vector) c that contains the coefficients cz, 
e an array x that contains the points of the grid. 

2. Write a script using y=SPE_LegLinComb(x,c) with x corresponding to 
a fine discretization of the interval [—1,1] and c corresponding to the 
combination Lo — 2L; + 3L5. Plot y as a function of x as in Fig. 5.1. 

A solution of this exercise is proposed in Sect. 5.6. at page 120. 
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i 05 0 0.5 1 


Fig. 5.1. Linear combination Lo — 2L1 + 3Ls. 


5.2 Gauss—Legendre Quadrature 


Numerical quadratures (or rules) are efficient tools for computing an approx- 
imation of an integral (see Krommer and Ueberhuber (1994)). In the general 
case, no antiderivative of the integrand is available, but the values of the 
integrand itself can be easily computed. Gauss quadrature is based on the 
following result, holding for smooth functions (: 


/ plz)dz = > plier + Ralp), (5.6) 


—1 


where 


1. the points x; (called the nodes of the formula) are the zeros of the Legendre 
polynomial Ls, 
2. the real numbers w; (called the weights of the formula) are given by 


ee E 
(1 — x7) [Ls (mi) 


i 


(5.7) 


Wi = 


H(i où 
Gs + DC (E), for ée(—1,1). 


The Gauss-Legendre quadrature formula is the approximation 


3. The remainder is R,(y) = 


S 


J p(x)dx & deli). (5.8) 


This formula is exact for y € P2,_1, since in such a case the remainder R,(y) 
is null. 

In order to use the Gauss quadrature formula, we explain an efficient way 
to compute its weights and nodes. For all x, the recurrence relations (5.4) for 
7=0,...,8 can be written in a compact matrix form, 
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Mu = zu +v, (5.9) 
where 
O 1 
a 0 bı 
M = = | (5.10) 
As—2 0 bs—2 
Qs—1 0 
with aj = j/(2j + 1), bj = (j +1)/(25 + 1), 
: Lola) 
u= bazz belt) , and u= | (5.11) 
1 Epa) 


Now let x be a zero of Ls; then v = 0 and the linear system (5.9) becomes 
Mu = zu, (5.12) 


which means that the zeros of L, are the eigenvalues of the s x s tridiago- 
nal matrix M. To compute the weight w; with formula (5.7), the recurrence 
formula (5.4) can be combined with the following relation: 
(=a JL e= sela ala) 821, (5.13) 
Since L;(x;) = 0, we get finally 
2(1 — x?) 
(sLs_1(xi))? 


where L;_1(x;) is computed by the recurrence formula (5.4). 


Wi = 


Exercise 5.2. 


1. Write a function SPE_xwGauss that computes the weights and nodes of a 
Gauss—Legendre quadrature formula. Compare your results for s = 8 with 
the table below: 


+0.18343464249565 |0.36268378337836 
+0.52553240991633|0.31370664587789 


+0.79666647741363 |0.22238103445337 
+0.96028985649754)0.10122853629036 





2. Write a script to validate your function: test the quadrature formula on 
various integrals whose exact values are known. In particular, check the 
exactness of the formula for polynomials in Pəs—ı and compare the exact 
and approximate values of the integral of e” on (—1, 1). 


A solution of this exercise is proposed in Sect. 5.6. at page 121. 
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5.3 Legendre Expansions 
We now associate to a function f € L?(—1,1) its Legendre expansion 
CO 
L(f) = > hs 
j=0 
with the Legendre coefficients f; defined by 


z De 
f= 2S OO (5.14) 





We also define the truncated expansion (see Sect. 3.3 page 63) 
P A 
Ly(f) = D fil. 
j=0 


This is an approximation of the function f, which is exact for polynomials in 
P, since (L;);_, is an orthogonal basis of P,. The calculation of the coefficients 





J; is done by a quadrature formula. The error induced by this approximation 
of f must be of the same order or negligible compared to the total error of 
the spectral method. Thus, it is necessary to use a high-order quadrature, and 
for this reason, the Gauss-Legendre quadrature is very suitable. 


Exercise 5.3. 





1. Write a script that computes the truncated Legendre expansion £,(f) of 
a function f. The script includes the following steps: 
e Compute the nodes and weights of the Gauss-Legendre quadrature for 
a given s. 
e Compute the Legendre coefficients ( n (5.14) by the quadrature 
formula (5.8). 
e Plot the function and its truncated Legendre expansion £,(f) on 
[-1,1]. 
2. Compare with the example in Fig. 5.2. 
3. Justify the choice of the Gauss quadrature parameter s as a function of 
the degree p of the truncated series. 
4. Test the script with less-regular functions, namely the functions abs (x) 
and sign(x). 
5. Plot the error (computed in the supremum norm) between f and L,(f) 
as a function of the truncation parameter p. 





A solution of this exercise is proposed in Sect. 5.6. at page 122. 


The choice of the number s of Gauss nodes is related to the degree p of 
the truncated expansion. The quadrature formula (5.8) with s nodes is exact 
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Fig. 5.2. Function f(x) = sin(6x)exp(—x), Le(f), and Lo(f). 


on P2,_1. In addition, every polynomial f € Pp is its own Legendre series 
f = L (f). In order to make the computation of the Legendre coefficients 
exact for f € P,, it is necessary to take s such that 2s — 1 > 2p, that is, 
s > p +1. The reader will verify that for a very smooth function, let us say 
f € C™(-1,1), the error f—L,(f) decreases to zero as p goes to infinity. This 
decreasing is faster than any power of 1/p, as shown in Fig. 5.3. In this figure, 
the error stops decreasing from p ~% 20, since the computer accuracy is then 
reached. In contrast, running the same script for the nonsmooth function || 
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Fig. 5.3. Error |f — £,f|. for f(x) = sin(6x) exp(—x). 


exhibits a very slow convergence, displayed in Fig. 5.4. For the discontinuous 
function sign, the expansion does not converge to f in the norm L®, although 
it does converge in the L° norm. The truncated series £,f has oscillations. 
As p increases, the size of the oscillations decreases slowly, except near the 
discontinuity x = 0, where the oscillations remain. This problem is known as 
the Gibbs phenomenon. The results for functions |x| and sign(x) are displayed 
in Fig. 5.5. 
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Fig. 5.4. Error |f — £,f|. for f(x) = |z]. 
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Fig. 5.5. Comparison of f with L3of. (a) f(x) = |x|; (b) f(x) =sign(x). 


5.4 A Spectral Discretization 


We consider a variational formulation of the problem (5.1) (see Chap. 4): 


Find u € Hj(—1,1) such that for all v € Hj(—1 a 


F u'(x)v'(x)dx + ef a u(x)v(x)dx = D f(x (ado 


Every regular solution of (5.15) is a solution to the problem (5.1). The space 
of test functions for this spectral method is the subset of Pm defined by 


PY, (2) = {p € Pm, p(—1) = p(1) = 0}. 
This linear space has dimension m — 1 and can be rewritten as 
Pr(2) = {p = (1-z°)g, ¢ € Pm-2} 


Since P? (QQ) is included in Hi (2), we can easily define the variational ap- 
proximation called the spectral Galerkin method: 


Find um € P? (2) such that for all vm € P? (9) : 


7 Um (2) Uy, (x) dx + ef Um (2)Um(x)dx = [ sde fade (5.16) 
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The functions F; = (1 —2x?)L' for i =1,...,m—1 form a basis of P} (2). We 
denote this basis by Fm. By linearity, the problem (5.16) is equivalent to the 
problem 


Find um € P? (2) such that for all F; € Fm : 





: i i (5.17) 
| u (x) F'(x)dx + ef AGE (ride J f(x)F;(x)dz. 
z =i = 
et tig = (Urie ue) be the vector whose entries are the coefficients 
of um in the basis Fm: 

m—1 

D T (5.18) 
i=1 


By plugging this expansion into (5.17), we get a linear system for üm, which 
we write in matrix form as 


At = Dis, (5.19) 


where the (m—1) x (m—1) matrix Am and the vector bm € R™~! are defined 
by 
URDE. 
ii 
C ED ( A(2i+ Dó 2êsj-2 But 
DDC ED NDS DOS) Bee Deer 


mn _ 21) 1s k 3 
(Ori = J IG) (x)dr = ES (= = ii = mat) . 


(Am)i,j = 








The terms f in the previous definition are the Legendre coefficients of the 
right-hand-side function f defined by (5.14). 

Once the linear system is solved (1.e., the coefficients of um in the basis Fm 
are known), the coefficients of um in the Legendre basis (Lj )}=o are computed 
using the identities 








(i+1) 
a E past aa 5.20 


Exercise 5.4. 


1. Write a program including the following steps: 

e Compute the matrix Am and the vector bm. This step includes the 
computation of the Legendre coefficients fẹ of the source term of the 
differential equation. 

Solve the linear system (5.19). 
Compute the Legendre coefficients à, of the numerical solution um 
using (5.20). 
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e Plot on the same figure the function u and the numerical approxima- 
tion um on [—1, 1]. 

2. Construction of an exact solution: take any reasonable function u.(x) such 
that ue(—1) = ue(1) = 0 and compute f(x) in such a way that ue solves 
the problem (5.1) with a constant c. Program the two functions u.(x) 
and f(x). The function ue will be used as a benchmark to compare the 
spectral solution and the exact solution. 

3. Advantages of the method: compare the spectral solution to a solution 
computed by a finite difference scheme leading to a linear system of iden- 
tical dimension (see Section 8.2). Do you obtain the same precision? Com- 
pute the error between the exact solution and the spectral one. Increase 
the number of points in the finite difference discretization to get the same 
precision. Draw some conclusion on the respective advantages of the two 
methods. 





A solution of this exercise is proposed in Sect. 5.6. at page 122. 


5.5 Possible Extensions 


The paragraph on the quadrature rules has several extensions. One can use 
a similar method to compute the nodes and weights of Gauss quadrature 
corresponding to other families of orthogonal polynomials. For example, the 
analogous formula to (5.6) for integrals on the real line R is 


+00 it 
J fo de = Y` wif (as) + Ralf). (5.21) 


ee i=1 


Here the nodes x; are the zeros of the Hermite polynomials (see below), the 
weights w; are given by (see (Davis, 1975; Szegő, 1975)) 


inh T 


(AG) 


and the remainder is 
nl /T 


~ 27 (2n)! 
The Hermite polynomials are defined by 





Ra(f) PAU). 


Ho(x) — Ly 
A(x) a 20, 
2¢4,(%) = Hn (£) +2nH4, (x). 


They are orthogonal for the inner product 


(f.g) = | Foge) de. 
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Regarding the application of spectral methods to PDEs, one can think to 
generalize the example studied here in higher dimensions. Up to dimension 
two or three, the main feature of spectral methods is still the approximation 
by high-degree polynomials, using this time tensor products of polynomial 
bases. These objects have a high precision and turn out to be very efficient for 
simple geometries: rectangular prism or cylinder, for instance. The treatment 
of the Laplace equation in a square or cubic domain is thoroughly detailed in 
Bernardi, Dauge, and Maday (1999) and in the recent work by Bernardi, Ma- 
day, and Rapetti (2004) (in French), with several possible types of boundary 
conditions (Dirichlet, Neumann, and Fourier). For complicated geometries, 
a domain decomposition technique is required (see for instance Wohlmuth 
(2001)). Detailed problems are also proposed in Bernardi and Maday (1997) 
and Bernardi, Maday, and Rapetti (2004), which can be handled starting from 
the case treated in this project, such as for instance the spectral discretization 
of the Dirichlet problem in an axisymmetric domain or the heat equation in 
one dimension. 





5.6 Solutions and Programs 


Solution of Exercise 5.1 

The computation of the linear combination (5.5) is performed by the function 
SPE_LegLinComb. In order to compute the values of the Legendre polynomial 
of degree p at points 171,...,2%n, there is no need to store all the values of the 
polynomials of degree less than p. Only the values corresponding to degrees 
p — 1 and p — 2, which come into play in the recurrence relation (5.4), must 
be stored in two arrays poli and pol2, along with the current values that are 
stored in pol. The values of the linear combination are stored in an array y 
to which the terms c;p;(x) are added as they are computed. 


The graphical display of Lp —2L;+3L; on the interval |—1, 1] (see Fig. 5.1) 
is done in the script SPE_PlotLegPol.m. The function SPE LegLinComb re- 
ceives in its input argument the array |0; —2; 0; 0; 0; 3] containing the coefficient 
values of the linear combination along with the points x; = —1 + (i — 1)/250 
for à =1,...,501 at which this function must be displayed. 

The MATLAB built-in function L=legendre(n,x) computes an array L 
with n+1 rows, whose (m+1)th row holds the values of the Legendre function 
Ly defined by 





Oh 
2\m/2 
E™ (x) = (-1)™(1 x?) Tn En (2): (5.22) 
at points specified by the vector x. Therefore it is possible to compute the 
values of the Legendre polynomial with this function using only the first row 
of the computed array. Another method to compute the linear combination 
is to call the function legendre for all degrees from 0 up to p, to extract 
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for each degree the first row of the output array, and to multiply it by the 
corresponding coefficient. 

The script SPE_PlotLegPol compares the computing times required by the 
two methods, by calling the function tic before the function SPE LegLinComb 
and the function toc right afterward. The value returned by toc contains the 
computing time in seconds. The same thing is done again before and after the 
group of commands for the method using the legendre function. In order to 
get meaningful computing time estimates, it is best to increase the number of 
computing points to 500 points evenly spaced on the interval [—1,1] and to 
increase simultaneously the degree of the linear combination to 50. The ratio 
between the computing times is then higher than 100, which is unquestion- 
ably in favor of the script SPE_LegLinComb.m. Using the function legendre 
in this context implies a great number of redundant or useless computations. 


Solution of Exercise 5.2 

The computation of Gauss abscissas and integration weights is done by the 
function SPE_xwGauss. The abscissas are the eigenvalues of the matrix M de- 
fined by (5.10); the function therefore starts by building the matrix and using 
the MATLAB built-in function eig to compute its spectrum. 

Once the abscissas and weights are computed and stored in two column 
vectors x and w, the quadrature formula (5.8) is encoded with a single MAT- 
LAB command. It consists in computing the scalar product of the vector w 
with the vector holding the values of the function at the integration abscissas 
x: I=w’ *f (x); 

The script SPE_TestIntGauss.m tests the quadrature formula on a smooth 
function, here the function e”, and also compares this integration method 
with the method proposed by MATLAB, which is programmed in the built-in 
function quad. 

The calling syntax is: q = quad(@fun,a,b). This command returns the value 
of the integral of the function defined in fun.m between the bounds a and b 
to a default precision of 10~°. The algorithm used in quad is the adaptive ver- 
sion of Simpson’s rule (see (4.5) in Chap. 4), and one can specify the required 
precision by adding a fourth input argument q = quad(@fun,a,b,preci). It 
is also possible to get in the output the number of calls to the integrand that 
were performed: [q,nb ] = quad(@fun,a,b,preci). 

In order to compare the computing time performances of the function quad 
with the Gauss method, we use for the latter a number of points large enough 
to obtain the value of the integral with six significant digits. Four points are 
sufficient in the case of the function e”, for instance. The respective com- 
puting times are estimated using functions tic and toc as in the previous 
exercise. Simpson’s method being less accurate than the Gauss quadrature 
on four points, it requires more evaluations of the integrand and is therefore 
slower. Hence in the context of our project, where a great number of integrals 
must be performed, using Gauss quadrature is more suitable. 
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Solution of Exercise 5.3 
The comparison of the function with its truncated Legendre expansion is 
performed in the script SPE_AppLegExp.m. It produces Fig. 5.2, where the 
function f(x) = sin(6x)exp(—x) is displayed along with Lg(f) and Lo(f). 
Slight modification produces Fig. 5.5 (a), where the function f(x) = |x| is 
compared with its truncated series Lpf to order p = 30, and Fig. 5.5 (b), for 
f(x) = sign(x). 

This script calls the function SPE_CalcLegExp, which receives in its input 
arguments: 





e s: the degree of the Gauss quadrature to be used to compute the coefficients 
of the expansion (5.14), 

e P: the degree of the truncated expansion, 

e npt: the number of points in the interval |—1,1] where the expansion is 
computed, 

e test: the name of the function whose expansion is computed, which should 
be defined either by an inline function or in a file. 








The function returns the output arguments: 


e x: the npt abscissas, 

e y: the values of the expansion at abscissas x, 

e err: the error in the supremum norm between the function and its expan- 
sion, estimated on the values at abscissas x. 


This function is also used in the script SPE_LegExpLoop.m to answer ques- 
tion 5 of the exercise, illustrated in Figs. 5.3 and 5.4. The Legendre expan- 
sion of a test function and the error with the function itself is computed 
for different degrees, varying here between 2 and 30 with an increment of 
2. The error in the norm L® is then displayed as a function of the de- 
gree of the truncated expansion. For a smooth function we expect exponen- 
tial behavior for the error, which is actually what we obtain numerically for 
the function f(x) = sin(6x) exp(—x). For less-smooth functions, for instance 
f(x) = abs(x), the error decreases proportionally with 1/P. Eventually, for 
a discontinuous function, such as f(x) = sign(x), the error does not go to 0 
when the degree of the expansion increases, due to the Gibbs phenomenon 
(see Fig. 5.5). 











Solution of Exercise 5.4 

We select here as a test case the function u(x) = sin(rx) cos(10x), which sat- 
isfies the homogeneous Dirichlet boundary conditions u(—1) = u(1) = 0, and 
we program it in SPE_special.m. By setting the right-hand side to 


f(x) = (n? + 130) sin(rx) cos(10x) + 207 cos(rx) sin(10x) 


and the constant c = 30, the function u is a solution of problem (5.1). The 
right-hand side is programmed in the file SPE_fbe.m, using the second deriva- 
tive of the solution function, which is programmed in SPE_specsec.m. The 
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following script SPE_SpecMeth.m computes the numerical solution using the 
spectral Galerkin method. It then compares it with the solution computed 
using the finite difference solution: 


m=16; h degree of Legendre approximation 

s=m+1; % degree of Gauss quadrature for the right-hand side. 
global c 

c=30. ; 


h Construction of the matrix 
A=zeros(m,m) ; 
for i=1:m 
ACi,i)=(ix(i+1))72*(1./(0.5+i)+... 
4.*c/((2.*it1.) *(2*i-1) *(2*i+3))) ; 
end 
for i=1:m-2 
A(i,i+2)=-2xcxi*(i+1)*(i+2)*(i+3)/((2xi+1)*(2xi+3)*(2xi+5)); 
end 
for i=3:m 
A(i,i-2)=-2xcxi*(i+1)*(i-2)*(i-1)/((2xi-1)*(2xi-3)*(2xi+1)); 
end 
% Construction of the right-hand-side vector 
Labsc,weights]=SPE_xwGauss(s) ; 
t=SPE_fbe(absc); u=t.*weights; 
LXO=ones(s,1); 
LX1=absc; 
C=zeros(m+2,1); 
C(1)=t’*weights/2; C(2)=3*u’ *LX1/2; 
for k=2:m+1 
% computes $f_k$ in c(k+1) 
h computes values of $L_k$ at integration abscissa 
h kL_k=(2k-1)xL_k-1 -(k-1)L_k-2 
LX2=((2*k-1) *absc.*LX1-(k-1) *LX0) /k; 
C(k+1)=(2*k+1) *u’ *LX2/2; 
LXO=LX1 ; 
LX1=LX2; 
end 
B=zeros(m,1); 
for i=1:m 
B(i)=2*ix* (itt) *(CCi) / (2*i-1)-C(i+2) / (2*i+3) )/(2*i+1) ; 
end 
h Solves the linear system 
U=linsolve(A,B) ; 
h 
% Change of basis 
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h(a 2) bea Gal) / (21d) ) Cae ie? Shred 
UN=zeros(1,m+2) ; 
for k=1:m 
CC=(k+1) *k*U (k) / (2*k+1) ; 
UN (k) =UN (k) +CC; 
UN (k+2) =UN (k+2)-CC; 
end 
h 
h Computes approximate solution and error 
h 
n=100; 
xa=linspace(-1,1,n); 
y=SPE_LegLinComb (xa, UN) ; 
es=norm(y-SPE_special (xa) , inf) ; 
h 
h Computes finite difference solution 
mdf=50; h=2/mdf; 
xdf=linspace(-1+h,1-h,mdf-1)’; 
A=toeplitz([2,-1,zeros(1,mdf-3)])/h"2+cxeye(mdf-1,mdf-1) ; 
B=SPE_fbe(xdf); ydf=linsolve(A,B) ; 
h 
% Graphical display 
h 
plot (xa,SPE_special (xa) ,xa,y,’--’ ,xdf,ydf,’x’) 
legend(’exact’,’spectral’,’Finite diff.’) 
fprintf(’erreur spectral= %e finite diff.= 4e ’,... 
es,norm(ydf-SPE_special (xdf) ,inf)); 
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Fig. 5.6. Comparison of the exact, spectral, and finite difference solutions. 
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Figure 5.6 is obtained with the script SPE_SpecMeth.m. It displays the 
exact solution, the spectral solution in P9}, and the finite difference solution 
on 20 points (set m=21 and mdf=20 in the script). 

It is clear that in this example, the spectral method is much more accurate 
than the finite difference one, since the approximate solution in P}, cannot be 
distinguished from the exact one. The error in the supremum norm is 5.1075 
when it is equal to 6.107? for the finite difference solution. Furthermore, a 
finite difference computation using about 800 points would be necessary to 
obtain the same order of error as with the spectral method. 

On the other hand, as soon as the solution of the continuous problem 
is not regular enough, the performance of the spectral method in terms of 
accuracy drops and becomes comparable, and even in some cases worse than, 
the performance of the finite difference method. The reader can easily verify 
this fact by building another test case, where the right-hand side f of equation 
(5.1) corresponds to a solution whose second derivative is a step function. 
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Signal Processing: Multiresolution Analysis 


Project Summary 


Level of difficulty: 1 
Keywords: Approximation, multiresolution analysis, wavelets 


Application fields: Signal processing, image processing 


6.1 Introduction 


This chapter is devoted to a short introduction to multiresolution analysis 
(MRA). This is a very promising field in mathematics, with numerous theo- 
retical and practical developments in engineering applications. Over the past 
two decades, wavelet functions have proven to be a very efficient tool for 
dealing with problems arising from data compression, and signal and image 
processing. Famous examples of applications are the FBI fingerprint database, 
and the new image coding standard MPEG3. 


6.2 Approximation of a Function: Theoretical Aspect 


6.2.1 Piecewise Constant Functions 


In this section we introduce the basic ideas of multiresolution analysis. Let 
N be the interval [0,1[, and consider a function f € Z1(Q). For any arbitary 
fixed integer j > 0 we define the intervals O = [|[27k, 27} (k + 1)| for k = 
0,1,...,27 — 1. We then approximate the function f by its projection P; f 
onto the family of functions constant on intervals OQ (see Fig. 6.2). The value 





of P; f on N; is computed as 
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j l 
Phi=% | J6 dt, Tor k = 01.22.52? =T: 
j 


We also introduce @ = xj0,1;, the characteristic function! of 2, and note first 
that @ satisfies the following property, known as the two-scale relation: 


Va EQ, (x) = (2x) + (2x — 1). (6.1) 


This relation is “plotted” in Fig. 6.1. 











Fig. 6.1. The two-scale relation (Haar basis). 


We remark that x + ¢(2/x2—k) is the characteristic function of the interval 
QF = |2-7k,2-I(k + 1)|, and we redefine P; f as 





al 


Vr EQ, Pif(z)= X Pfa- k). 


k=0 


Since {2 is a bounded domain, f € L'(2) whenever f € L?(Q2), and P;f is 
then an element of the vector space 


V; = fy @ LR), fior is constant, for k = 0,1,.. 2? — 1}. 


The space V; has finite dimension dim V; = 27. For k = 0,1,...,27 — 1, 
we define the functions oF as 





Vee Q, f(x) = 29/2 p — k). (6.2) 


! In the following, X[a,b[ is the characteristic function of the interval [a,b]. 


6.2 Approximation of a Function: Theoretical Aspect 129 


The 2? wre oF er and are orthonormal relative to the L? scalar 


product: ( ST Ga. ) dt. Using this orthonormal basis, we can write 
27 —1 
Vee, BIOS S (60)e@) -F debe 
k=0 





where the coefficients cf are the components of P; f in the {07 }r5 basis; they 


are computed according to 
E= (6) = | FORO d=] pce) at (6.3) 
a 


The application P; is then the orthogonal projection onto V; relative to the 
L?(9) scalar product. Consider now two arbitrary integers j’ > j > 0, and 
define as previously two spaces V; and Vj. The basis functions {OF berg? 


of the space Vj are constant on intervals 0 of length 2-7, while the V; 


basis functions {oF} 4,3 are constant on intervals N; of length 2-7 > 2-1. 
Because V; € V;, the function P; f is a more accurate approximation of f 
than P; f, in the sense that ||P; f — fll < ||P; f — fll2. It can be proven that 
this approximation P;f converges to f in L?({2) as j goes to infinity (see 
Fig. 6.2). Furthermore, when f € C°(9), the approximation P;f converges 
to f according to the uniform norm: lim;_,4. |f — P; fl = 0. 

For an arbitrary fixed integer 7 > 0, we consider now the two spaces V; and 
V;+41- From (6.1) we may write, for any f € L?(Q) and for k = 0,1,...,27—-1, 


2k+1( 
v2 fros ) dt = [ro Oat O a+ | FOA re 
This leads to a first relation connecting the coefficients ci and és 


ci = Ci + Gay) Ve for k = 0,1,..., 2 — 1. (6.4) 


Remark 6.1. In the case of an unbounded domain (2 = R for example), we 
may change the definition of the space V; to 


= fy € L? (R), flax is constant, k € Z) : 


6.2.2 Decomposition of the Space Vz 


Consider now an arbitrary fixed integer J > 0. Then for any integer 7 satis- 
fying J > j > 0, we define successive functional spaces Vj, Vj41,..., Vz, that 
satisfy V; C Vj4i C -+ C Vj. For any given function f € L?(92), a standard 
way to write the orthogonal projection of f on the subspace V;+1 is to con- 
sider P;,1f as the orthogonal projection of f onto V; along with a correction 
term: 
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Fig. 6.2. Approximating a function using mean values: (a) j = 6; (b) j = 8. 


Pif = Pif + Pjr — Pi) = Pif +Q. (6.5) 


This relation introduces the new operator Q; = P;,1—P;, which is actually 
the orthogonal projection operator onto W;, the orthogonal complement of V; 
in Vj41: Vj+1 = Vj 6 W;. It is easy to check that the function Ÿ = Xo,1/2[ — 
X11/2,1[ satisfies 


(x) = (2z) — (2x — 1). (6.6) 


Now we consider the functions pe defined by 
VIE, v =r Oak) tor WO, La) 1. 27) 


The 2/ functions pe span W; and are orthonormal relative to the L? scalar 





product. Then for an arbitrary function f € L?(2), we compute the coeffi- 
cients di according to 


at = (f, 4%) = | FOYE (t) dt 


| f(t) d- 21/2 | f(t) dt. 
Q2k Q2kt 


j+1 


(6.8) 


The coefficient di is the fluctuation of f on the interval N2; . Using (6.3), we 
write first 


di = (GaG )/V2 (6.9) 
and then, by adding (6.9) to (6.4), we get 
vlene pa tor 0; e 2) 1, (6.10) 


while subtracting (6.9) from (6.4) leads to 
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2k+1 k k ] 
Vac rer 0 for alias Sk (6.11) 





These relations are connected to the space decomposition V;,1 = V; BW;. 
We gather these results in the useful relations (6.12) and (6.13), basic elements 
of the decomposition and reconstruction algorithms: 


cj = (oh + Gyr )/ V2, (6.12) 
d(C ea N for k = 0,1,...,27 — 1: 
chi = =e + d) / V2, 
(6.13) 


lo ed 10/2, or 0, Vieng 0 di 





Before going any further, we remark that it is possible to iterate the space 
decomposition process according to 


Vz = Vy_1 © Wy_-1 = Ve-2 D Wj-2 @Wy_-1 =». 
= Vo ® Wo ® --- BWy_-2 8 Wy-1. 





(6.14) 








Since the functions oF (respectively pe) span an orthonormal basis of V; 
(respectively W;), we are now able to define many orthonormal bases of Vz. 
Among all possibilities, the emphasis is put on two particular bases: the canon- 
ical basis, generated by functions o5, 


JIi 


Pif = D ci, (6.15) 


k=0 


and the so-called Haar basis, spanned by ¢? and all functions ve, for = 
0,1,2,...,J — 1 and k —0,1,...,27 — 1: 


J=127=1 
P =hdo+ >. > dkeph. (6.16) 
j=0 k=0 
Remark 6.2. Since ¢8 = ¢ = X(o,1; is the characteristic function of the 


whole domain 2, the coefficient ci is simply the mean value of f on R: 
= fo f(t) dt 


Remark 6.3. It is worth noting that the family of functions eon which 
form an orthonormal basis of the finite-dimensional space V7, converges to an 
orthonormal basis of the infinite-dimensional space L? (9) as J goes to infin- 
ity. So the coefficients c% are also the components of f in the corresponding 
basis of L? (9). Note also that the family of functions {0} U E ar aaa 
an orthonormal basis of the finite-dimensional space V;, converges to an or- 
thonormal basis of the infinite-dimensional space L? (9) as J goes to infinity. 
The coefficients c? and di are the components of f in the corresponding basis 


of L?(2). 
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6.2.3 Decomposition and Reconstruction Algorithms 


In this section, we look at the standard operations required to switch from 
the expression of a function in the canonical basis of Vy to its expression in 
the Haar basis, and conversely. 

A. Let f be a function in L?(Q) and J an arbitrary fixed integer. We 
compute the 27 coefficients ce, either exactly if we have an expression for 
f, or approximately using a sampling of the 2° values of f on intervals 9. 
Starting from these 27 coefficients c*, we successively compute the coefficients 


cf and di according to the following algorithm: 





for j = J —1,...,1,0 compute 
for k = 0,1,...,27 — 1 compute 


k 
cf = (hha + ejti )/V2 


2k+1 
di = (ha — ey )/V2 


(6.17) 


end 


end 


Decomposition algorithm. 


This calculation is referred to as the analysis or decomposition algorithm 
of f. We may represent step 7 of this algorithm by the symbolic scheme 


ck 


J 
(Os hk <2?) 
LN 
Cj dj 
On ka2 MD ER) 


Once computed, the 27 coefficients di are not used in the next steps of 
the decomposition algorithm; only the 2/ values of the coefficients {cP hk are 
required in order to compute the coefficients {cf_1} k and (di ihe. The com- 
putational cost of step j in algorithm (6.17) is that of computing 2 x 2/7! 
coefficients, that is, exactly 21+! operations. The computational cost of the 
decomposition algorithm, required to obtain the values of the 27 coefficients, 
is then 





DR eee) Bee De A 0 re A Os “Operations: 


This may be considered as an optimal value, since we are computing 27 
outputs from 27 inputs for a cost of O(2”) operations. 


B. Conversely, assume that we know cô, the mean value of f on Q, and all 
other coefficients di for j =0,..., J —1. Then we retrieve all the coefficients 


e using the following algorithm: 








6.2 Approximation of a Function: Theoretical Aspect 133 


for j =0,..., J — 1 compute 
for k = 0,1,...,27 — 1 compute 


cj41 = (cf + di)/V2 
ECET 


end 


(6.18) 


end 
Reconstruction algorithm. 


This calculation is referred to as the synthesis or the reconstruction algo- 
rithm of f. We represent step j of this algorithm by the symbolic scheme 





C dE 
(Or DI) (re | 
Se a’ 

Go 
(0<k < 21) 


We remark again that the computational cost of step 7 of this algorithm 
is that of computing 2 x 2171 coefficients, that is, 27+! operations. The total 
computational cost of the reconstruction algorithm is also O(27) operations. 

Both algorithms (6.17) and (6.18) are efficient tools for obtaining the com- 
ponents of a function in the Haar basis from its components in the canonical 
basis, and conversely. There exist many other orthonormal bases of the space 
Vz and as many corresponding algorithms; we shall see two further examples 
of such algorithms, which have strong similarity to (6.17) and (6.18). This 
calculation is referred as a multiscale analysis or multiresolution analysis. 


6.2.4 Importance of Multiresolution Analysis 


We now look more closely at the data compression aspect included in the 
multiresolution analysis formulation. We assume first that the function f is 
constant (f = C # 0) on Q, and compare two expressions for P;f. The 
first is the representation of f in the canonical basis. According to (6.3), all 
the 27 coefficients en are equal to C, and are thus different from zero. To 
represent P;f in this basis, an array of 27 components is required. From 
another point of view, the expression for P;f in the Haar basis needs to 
compute coefficients us and de from the c*’s according to (6.17). We 
obtain immediately eos = C and da =O: fork’ 0.225 1, Coms 
further in the computation, we see that a = Č and di = 0 for j = J—-1,...,0. 





Finally, there is only one nonzero coefficient in the Haar basis: c = C. 
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In information processing, attention is focused on the most condensed 
expression of a signal, in order to compute information, store it in memory, or 
send it through a network. Many algorithms are dedicated to the compression 
or decompression of data without any loss. We understand with the previous 
example how useful multiresolution may be in that case. In a more general 
case, when a function f is no longer constant, the coefficient cf stands for 





the mean value of f on OQ, while the coefficient di represents the variation 
of f at the scale j. For any slowly varying function f, many coefficients di 
have a small value and may be neglected. Then the number of significant 
coefficients in the Haar basis representation is far smaller than the number of 


coefficients c% present in the canonical basis. Conversely, a large coefficient di 





is associated with a fast variation of the function f within Q This property 
is of great interest when one wants to look automatically for the singularities 
of a function. As an illustration we just mention that special events in the sky 
are automatically detected by computers analyzing thousands of photographs 
of stars captured daily by telescopes. 


Remark 6.4. Fourier analysis is known to be useful for dealing with oscillat- 
ing signals. It is a very accurate way to capture the frequencies hidden in a 
signal, but its important drawback is the lack of spatial localization of these 
oscillations, due to the use of cosine or sine functions oscillating on the whole 
domain. This drawback is not present in multiresolution analysis, where the 
basis functions have supports limited to the 2}. Unfortunately, the frequency 
localization is then less accurate than with the Fourier basis. 





6.3 Multiresolution Analysis: Practical Aspect 


In this section we deal with a very simple example in order to understand the 
practical efficiency of the multiresolution analysis theory. But before doing 
this, it remains to clarify the way we shall store the different coefficients arising 
from the previous algorithms. Let 2 C R be a bounded interval, f a function 
defined on §2, and J > 0 an arbitrary fixed integer. Using the formulas of 
algorithm (6.17), we compute for j = J —1,...,0 coefficients e and di, for 
k =0,1,...,27—1. These coefficients are stored in the following way: We first 
compute coefficients c% according to (6.3) and store them in an array [cy] of 
27 components. Then we compute c? and Hd hk} using (6.17), and store 
them in an array [dy] of 27 components. We begin at step J of algorithm 
(6.17) by imposing [dy] = [cz]. Then at step J — 1 the 277} coefficients c4; 
are stored in the first half of array [dz] (components 1 up to 2771), while the 
27-1 coefficients d*_, are stored in the second half of array [dy] (components 


27-1 + 1 up to 27). At step J — 2 the 277? coefficients Cs are stored in 
the first quarter of the array (components 1 up to 2/2), thus erasing the 
now useless values of coefficients ë i for k' = 1,2,...,2772. Then the 277? 


coefficients da are stored in the second quarter of the array (components 
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2772 +1 up to 277+), thus erasing the remaining useless values of coefficients 
es for k’ = 277? 4.1,2,...,27~1. Note that during this operation the [dy] 
components from 27~! + 1 up to 2%, which correspond to coefficients d 
are not modified. Proceeding in this way until step j = 0, we finally get the 
following array [dy]: 


i J= h 
aSa daen dt “en aa aa e (6719) 


This storage is related to the decomposition of the space V7 according to 
the scheme 


Wy-1 
W J -2 


Vi: =? 
Vj-2 > Wo 


Vo 


6.4 Multiresolution Analysis: Implementation 


Let f be the function defined on 2 = [0,1] by f(x) = exp(—x) sin 47a. We 
choose J = 10 and compute the arrays |c;| and [dy] associated with Prf. 


Exercise 6.1. 1. Write a program that computes all coefficients c% accord- 
ing to (6.17). Store these coefficients in an array [cz] with 27 components. 
2. Using the decomposition algorithm (6.17), compute for 7 = J —1,...,0 


all coefficients cf and di , for k = 0,1,...,27 — 1. Store these coefficients 


in an array [dy] with 2” components, as detailed in the previous section. 
3. Write a program that computes all coefficients c4 from the [dz] compo- 
nents, according to the reconstruction algorithm (6.18). Check the results. 








A solution of this exercise is proposed in Sect. 6.7 at page 148. Using these 
direct and inverse transformation programs, one can perform some numerical 
experiments. We shall deal now with an example of a compression algorithm. 


Exercise 6.2. 1. Calculate the number of coefficients in an array |d z| whose 

absolute values are greater than € = 277/2 x 107°. 

2. Copy array [dy] into a new array [d=] and set to zero every component of 
Id] whose absolute value is less than € (a = 0 when |d¥| < e). 

3. Compute the array [c5] from [d5] using the reconstruction algorithm 
(6.18). 

4. Visualize the resulting signal and compare both curves representing P7f 
and P$ f. 
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5. Study the variations of | PS f — P;fll2 and the number of nonzero coeffi- 
cients in |d§] as € varies. 


A solution of this exercise is proposed in Sect. 6.7 at page 148. Table 6.1 
displays results of this experiment (with f(x) = exp(—x) sin 4x and J = 10). 
The number of nonzero coefficients is reported in front of the threshold value, 
with the corresponding relative error | PS f—Pyf||2/||Psf le. Fig. 6.3 plots two 
signals reconstructed after thresholding. We emphasize here the compression 
capability of the method: using only 352 coefficients instead of the 1024 sample 
values, we obtain 0.8% relative error. 





Threshold | Coefficients | Relative error 


0.0402 
0.0258 
0.0080 
0.0042 
0.0003 
0.0001 
0.00001 





Table 6.1. Thresholding (Haar wavelet). 


Haar wavelet Haar wavelet 
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—— Reconstructed Signal) | 0.8} — Reconstructed Signall - 





















































Fig. 6.3. Reconstruction after thresholding: J = 10. (a) € = 0.10; (b) € = 0.01. 
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6.5 Introduction to Wavelet Theory 


6.5.1 Scaling Functions and Wavelets 


If you have correctly performed the previous numerical experiments, you de- 
serve our warmest congratulations for your first steps in the fabulous world 
of wavelets! When Haar proposed (in 1910!) the construction of an orthonor- 
mal basis of the space L?({2) like the one discussed previously, he was in fact 
far from imagining the practical importance of his discovery. Wavelet the- 
ory was founded in the sixties, arising from an idea of a petroleum engineer 
named Morlet, who was looking for algorithms well suited to seismic signal 
processing, and more accurate than Fourier analysis (see Meyer, 1990). 





Haar wavelet Haar wavelets 













































































Fig. 6.4. Haar wavelets (J = 10). (a) Single wavelet; (b) shifted wavelets. 


The Haar basis construction is the foundation of multiresolution analysis. 
In the associated terminology, functions oF are the called scaling functions, 
while functions pe are wavelet functions, or wavelets, for short. What is the 
appearance of a wavelet? Have a look at the previous numerical experiments: 
for J = 10 we begin by setting to zero all components of the [d7] and give 
the value 1 to only one of the coefficients d: 2 then using the reconstruction 
algorithm (6.18), we obtain the associated wavelet ve, as plotted in Fig. 6.4(a). 
Note that the integers k and 7 have to satisfy the following conditions: 0 < j < 
J and 0 < k < 27. Choose now another integer k’ 4 k such that 0 < k < 2/ 
and repeat the experiment. This leads to a wavelet pe, which appears (see 
Fig. 6.4(b)) to be shifted from wavelet pe by an offset of 2~/(k’ — k). Both 
wavelets belong to the subspace W; and are hence called level-7 wavelets. All 
level-j wavelets (they are altogether 27 in level 7) are obtained from any one 
of them by a p 271 shift (p an integer). 


? The level j coefficient d? is stored in the (27 + k + 1)th component of [dy] (see 
(6.19)). 
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Now, if we choose two integers 7’ and k” such that 0 < 7) £ j <J 
and 0 < k” < 2) 7 the corresponding wavelet wk has features similar to the 
previous wavelets: same global shape but different size (amplitude has been 
multiplied by 2/ ad , while the length has been divided by the same factor (see 
Fig. 6.4(b)). 

The whole space Vy C L?(Q) is then generated by direct summation of 
the subspace Vo and all orthogonal subspaces W;, each of which is spanned 
by 27 wavelet functions. When J goes to infinity, we retrieve the structure 
introduced by Haar (see Remark 6.2): L?({2) is a direct summation of finite- 
dimensional orthogonal subspaces. Moreover, for any f € L? (Q), the multires- 
olution analysis can be summarized in 








j=J-127-1 
Pif =(F,60)¢0+ D, X UWS, 
j=0 k=0 
23 —1 
F=(F,d)60+ >, > Fv ey, (6.20) 
320 k=0 


23 —1 
F=Prf +>) X (FWS. 


j>J k=0 





Hence the coefficients c8 and di are the components of f in the correspond- 
ing basis of L?(Q). From that point of view, the wavelet theory appears to be 
a powerful tool for approximating functions. Since the relation Vj_1 = Vj BW; 
holds at any level 7 < J, we may consider the subspace W; as a set of detail 
functions, that is, the functions we have to add to the functions of V;, in order 
to retrieve all V;}1 functions. 

By fixing an integer J < +00, we restrain the space description to the scale 
277, and consequently we are unable to capture the variation | f(x) — f(x')| 
when |x — x/| < 277. On the other hand, we may write ||P; f — fl < C 277 
for any function f in L?(Q) with bounded variation. This means we know 
exactly the accuracy of an approximation P;f of a given f; moreover, we also 
know the price to pay to improve this result: we have to compute at least 27 
new coefficients. 

In the previous example, all scaling functions are derived from the same 
function ¢ by the relation oF (a) = $(2/2 — k). Likewise, the relation pe (x) = 
(27x — k) shows that all wavelet functions are derived from the same func- 
tion 4, sometimes called the mother wavelet function. In this first study, both 
@ and w are discontinuous functions; it follows that the approximating func- 
tion P;f is also discontinuous, even when f is continuous. How is one to 
get a more regular approximation? Much work has been done to answer the 
question; there exist abstract necessary conditions on the pair (¢, Y) in order 
to generate a general framework for multiresolution analysis. In short, it is 
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possible to build continuous wavelet approximations for a given continuous 
function; however, these aspects of wavelet theory are beyond our scope. We 
refer to Cohen and Ryan (1995), Cohen (2000, 2003), Daubechies (1992), and 
Mallat (1997) for further details. In the following sections, we introduce two 
examples of continuous wavelets: the Schauder and the Daubechies wavelets. 

For the sake of simplicity we shall limit our study to periodic continuous 
functions on §2. Although wavelet theory is able to address the general case, 
it needs some technical modifications that we want to skip here. 


6.5.2 The Schauder Wavelet 


We follow here the outline of the previous section and introduce a new Vy 
space definition: 


Vj = {F € C(A), figs is affine, for k =0,1,...,2 — 1} c 199), 


We are now dealing with piecewise linear functions continuous on 2. We 
consider first the function @ defined by 


(x) = max (0,1 — |x|). (6.21) 


The function ¢ satisfies the two-scale relation (see Fig. 6.5) 


(x) = (20 — 1) + (2x) + = 6(2r + 1). (6.22) 





Then we define the functions pi by scaling and shift: 
00e) fork S01 ea, (6.23) 


The functions 4% are known as hat functions in the finite element method: 
though they do not span an orthogonal basis of V;, we shall admit that 
they provide the Schauder wavelets by the definition pe = PS1) for k = 
0,1,...,2% — 1. It is not the only eligible choice in that case, but this is the 
simplest one (for more details see Mallat (1997)). Fig. 6.6 displays a set of 
Schauder wavelets when J = 3. For any function f we consider again the two 
representations 


Dies] Jri 
P= > chek and ETA So a (6.24) 
k=0 j=0 k=0 


This definition leads to the decomposition formulas 


— 2k 
ce 
dev? x Lek + c7*4?)| |, for k=0,1,...,27-1 ea 
j JEL OS i ae a ga 
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D (2x) 


Fig. 6.5. The two-scale relation (Schauder basis). 


Schauder wavelets 

















Fig. 6.6. Some Schauder wavelets (J = 3). 


as well as the reconstruction formulas 


oe V2 4 


“rk od 

(6.26) 
caket V2 dé + (ck + cbt) for k = 0,1 29 —1 
Ge ay GR moe | 


Any coefficient ci stands here for the (normalized) value of f at the point 


g = 2% — k, and a usual interpretation of (6.25) is that the decomposition 
algorithm proceeds by elimination, keeping only point values of even indices 


when going from level j + 1 to level j. Similarly, the coefficient di appears to 


be the difference between the (normalized) odd-index point value eae and 


the linear interpolation of the adjacent (normalized) even-index point values 
2k 2k+2, 


C541 and Ci41 3 it is known as the detail, that is, the value to be added to the 
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current level-j values cë and ci in order to obtain the level-(7 + 1) value 


J 
E as can be seen in the reconstruction algorithm (6.26). This process is 


similar to the multiresolution idea introduced in (6.5) and (6.20). Fig. 6.7 
shows a graphical interpretation of this computation. 








Fig. 6.7. Values and details. 


A computer implementation of the relations (6.25-6.26) is easily obtained 
from the previous algorithms (6.17-6.18). This is a very attractive property of 
wavelet theory. All decomposition and reconstruction algorithms are similar 
to (6.17) and (6.18); switching from a particular wavelet basis to another one 
results from a slight change in the formulas. Moreover, this change arises only 
from the corresponding two-scale relations. Consequently, both [czy] and [dy] 
share the same structure of 2/-component arrays. The general computation, 
common to all decomposition and reconstruction algorithms, is known as the 
Mallat transform (see Mallat, 1997). 


6.5.3 Implementation of the Schauder Wavelet 


Let f be the function defined on 2 = [0,1] by f(x) = exp(—x) sin 47a. We 
choose J = 10 and compute the arrays [cy] and [dy] associated with Pjf. 


Exercise 6.3. 1. Write a program that computes all coefficients c% accord- 
ing to (6.25). Store these coefficients in an array [cz] with 27 components. 
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2. Using the decomposition algorithm (6.25), compute for 7 = J —1,...,0 
all coefficients cf and di , for k = 0,1,...,27 — 1. Store these coefficients 
in an array [dy] with 2” components, as detailed in the previous section. 

3. Write a program that computes all coefficients c4 from the [dy] compo- 
nents, according to the reconstruction algorithm (6.26). Check the results. 





Remark 6.5. The use of periodic functions requires a particular treatment at 

both edges of the domain. More precisely, formulas (6.25) and (6.26) use the 
š 23 _ 0 e = 

relation ci = Ci, forj =] 12.285 =A 


A solution of this exercise is proposed in Sect. 6.7 at page 148. We deal 
again with an example of the compression algorithm. 


Exercise 6.4. 1. Calculate the number of coefficients in the array [d z| whose 

absolute values are greater than € = 277/2 x 107°. 

2. Copy array [dy] in a new array |d§] and set to zero each component of 
[d5] whose absolute value is less than € (df = 0 when |dj| < €). 

3. Compute the array |c5] from |[d5] using the reconstruction algorithm 
(6.26). 

4. Visualize the resulting signal and compare both curves representing P; f 
and P$ f. 

5. Study the variations of || PS f — Pz f||2 and the number of nonzero coeffi- 
cients in |d§] as € varies. 





A solution of this exercise is proposed in Sect. 6.7 at page 149. Table 6.2 
displays results of this experiment (with f(x) = exp(—x) sin 4rz and J = 10). 
The number of nonzero coefficients is reported in front of the threshold value, 
with the corresponding relative error || PS5 f — Pyf||2/||Psf\l2. Fig. 6.8 plots 
two signals reconstructed after thresholding. We emphasize the spectacular 
compression capacity of the method: using only 77 coefficients arising from 
1024 values, we obtain a 0.2% relative error. We see that for a smaller number 
of significant components in array [dz], we get a better approximation with 
the Schauder wavelet than with the Haar wavelet. This is not a big surprise 
because Pz f is now a continuous approximation of the same continuous func- 
tion f. Note that there is a small increase of the computational cost, due to 
the use of more coefficients in formulas (6.25) and (6.26). 





6.5.4 The Daubechies Wavelet 





Is it possible to improve these results? Cohen (2003) has proven that a mul- 
tiresolution analysis is available as soon as there exists a generalized two-scale 
relation such as 


olx) = D hkol2z — k). (6.27) 
keZ 


In signal-processing theory, the h;’s are the components of an array h, 
called a filter. Knowledge of a filter is a necessary and sufficient condition to 
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Threshold | Coefficients | Relative error 


0.0155 
0.0072 
0.0020 
0.0010 
0.0003 
0.0002 
0.00005 





Table 6.2. Thresholding (Schauder wavelet). 


Schauder wavelet Schauder wavelet 
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Fig. 6.8. Reconstruction after thresholding: J = 10. (a) € = 0.10; (b) € = 0.01. 


build a multiresolution analysis. From (6.27) one may write the complemen- 
tary relation 


ple) = D hew(2a — k). (6.28) 


keZ 


Both relations are related to decomposition and reconstruction algorithms, 
as previously established in (6.17-6.18) for the Haar wavelet, or (6.25-6.26) for 
the Schauder wavelet. Daubechies (1992) has proven that the mother wavelet 
regularity depends on the filter length, that is, the number of nonzero co- 
efficients hy, used in relation (6.27). A general method for defining compact- 
support wavelets with arbitrary regularity has been proposed, introducing the 
Daubechies wavelets family. To put it in a nutshell, the more nonzero coeffi- 
cients appear in the two-scale relation (6.27), the more accurate is the wavelet 
approximation. To end with this study, we shall deal now with the Daubechies 
wavelet D4, which is defined by the following formulas: 


144 6 Signal Processing: Multiresolution Analysis 
Decomposition: 


k 2k—1 2k 2k+1 2k+2 








(6.29) 
di = C3 ee. — Co ty + Ci ee — Co ee 
Reconstruction: 
Cry = Cs a — Co I + Ci ci — Co di, (6 30) 
Ca. = C5 ci +C di + Co oe + C3 de 
According to Daubechies (1992), the values of CZ are respectively 
1+V3 3 + V3 
Co = = a | Ci = ) 
we ae (6.31) 
3- V3 1-3 | 
C2 = — 


OE 6 Cy 
42 S A2 


6.5.5 Implementation of the Daubechies Wavelet D4 


Let f be the function defined on N = [0,1] by f(x) = exp(—x) sin 472. We 
choose J = 10 and compute the arrays |c;| and [dy] associated with P7f. 


Exercise 6.5. 1. Write a program that computes all coefficients c% accord- 
ing to (6.29). Store these coefficients in an array [cy] with 27 components. 
2. Using the decomposition algorithm (6.25), compute for 7 = J —1,...,0 
all coefficients cf and di, for k = 0,1,...,27 — 1. Store these coefficients 
in an array [dy] with 2” components, as detailed in the previous sections. 
3. Write a program that computes all coefficients c4 from the [dz] compo- 
nents, according to the reconstruction algorithm (6.30). Check the results. 

4. Visualize a Daubechies wavelet (see Fig. 6.9; what a surprise!). 











Remark 6.6. As previously noticed, we consider here periodic functions, and 
special formulas are required to treat the domain edges. 


A solution of this exercise is proposed in Sect. 6.7 at page 149. We deal 
again with an example of a compression algorithm. 


Exercise 6.6. 1. Calculate the number of coefficients in array [dy] whose 

absolute values are greater than € = 277/2 x 107°. 

2. Copy array [dy] in a new array |d§] and set to zero each component of 
|d5| whose absolute value is less than € (dj = 0 when |d;| < €). 

3. Compute the array [c$] from [d5] using the reconstruction algorithm 
(6.26). 

4. Visualize the resulting signal and compare both curves representing P7f 
and P$ f. 





6.5 Introduction to Wavelet Theory 145 


Daubechies Wavelet 














à 


—0.1 L L L | L i L 1 L 
$ 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 











Fig. 6.9. The Daubechies wavelet D4. 


5. Study the variations of || PS f — Pz f||2 and the number of nonzero coeffi- 
cients in |d§] as € varies. 


A solution of this exercise is proposed in Sect. 6.7 at page 149. Table 6.3 
displays results of this experiment (with f(x) = exp(—x)sind4rx and J = 
10). The number of nonzero coefficients is reported in front of the threshold 
value, with the corresponding relative error | P$ f — P;f|l2/[|lPrflle. Fig. 6.10 
plots two signals reconstructed after thresholding. We emphasize again the 
spectacular compression capacity of the method: by using only 78 coefficients 
arising from 1024 values, we obtain a 0.3% relative error. 


Threshold | Coefficients | Relative error 





Table 6.3. Thresholding (Daubechies wavelet D4). 
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Fig. 6.10. Reconstruction after thresholding: J = 10. (a) e = 0.10; (b) € = 0.01. 


6.6 Generalization: Image Processing 


Generalization of the previous results to image processing is straightforward. 
We might define a wavelet function of two variables w2(x, y); meanwhile, the 
tensor product is easier to deal with. We then consider a mother wavelet of the 
form (x, y) = d(x)d(y). This choice introduces a two-dimensional Mallat 
transform, where the image to treat is a matrix [cz]: we begin to proceed 
to a row-by-row decomposition. The resulting transformed rows are stored 
in a matrix [C7]; we proceed then to a decomposition of the [¢;] columns, 
and the final results are stored (column by column) in a matrix [dy]. This 
matrix contains all the components of the image in the wavelet basis. It is 
then possible to compress this object using a thresholding algorithm, and the 
compressed data are stored in a matrix [d]. The use of a column-by-column 
reconstruction algorithm followed by a row-by-row reconstruction algorithm 
provides a new image |c5] from [d5]. From a practical point of view, some 
operations may be performed using the [dy] representation directly rather 
than the [cz] initial one: 


e Two distinct images may be compared in the (compressed) wavelet format; 
this is very helpful for saving computing time. As an example of such 
utilization, we cite the research of suspected fingerprints in a criminal 
fingerprints database. As the information is more condensed into wavelet 
storage, comparisons go very fast. 

e Storage in [dy] format is also useful to detect singularities because they 
are associated with large values of coefficients [d5]x1. So the presence (or 
absence) of such coefficients may reveal special features of the original 
image |cz| very quickly. 
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6.6.1 Image Processing: Implementation 


We assume here that F is an image defined as a two-dimensional pixel array 
[cy]. 





Exercise 6.7. 1. Write a procedure performing decomposition and recon- 
struction of a given image |[c,;] for all three wavelet functions described in 
the previous sections. 

2. Check the thresholding compression algorithm. 
3. Compare and visualize all results. 


A solution of this exercise is proposed in Sect. 6.7 at page 149. Fig. 6.11 
and Fig. 6.12 display original and reconstructed images. For a threshold value 
e = 107° x (27)?, the numbers of nonzero components in [d%] are respec- 
tively nbcr = 2298 using the Haar wavelet, nbcy = 11295 using the Schauder 
wavelet, and nbc = 1887 using the D4 Daubechies wavelet. The original doc- 
ument is a 256 x 256 pixel image, corresponding to a 65536 coefficient matrix 


[cz]. 

















Fig. 6.11. Images. (a) Original; (b) reconstructed (Haar wavelet). 


Multiresolution analysis is rich in theoretical and practical developments. 
Numerous projects are progressing all around the world, making it one of the 
most active areas of research in the mathematical sciences. Readers will find 
a large literature on this subject. Among many papers of interest, we cite 
Cohen (2000, 2003), Daubechies (1992), Mallat (1997), and Meyer (1990). 
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Fig. 6.12. Reconstructed images. (a) Schauder wavelet; (b) Daubechies wavelet. 


6.7 Solutions and Programs 


Solution of Exercise 6.1 





The file MRA_haar.m provides the procedure related to decomposition and 
reconstruction algorithm (6.17) and (6.18). A flag parameter allows one to 
switch from decomposition ({c7;] —> [d z]) to reconstruction (|dz7] — |c,]) 
formulas. 

The file MRA_haar_ex1.m provides the procedure that generates a sampling 
of a function on the interval [0,1] (for this particular example, the function 
contained in the file MRA_function.m is defined by f(x) = exp(—x) sin 472). 
This sampling is then used to define a piecewise constant function with the 
help of the procedure MRA_pwcte. Decomposition and reconstruction compu- 
tations are then performed by the procedure MRA_haar. 





Solution of Exercise 6.2 


The file MRA_haar_ex2.m provides the procedure for performing the same 
computations as MRA_haar_ex1, with the difference that all coefficients whose 
absolute values are smaller than the threshold are set to zero. This procedure 
is used for the compression tests of Table 6.1 and Fig. 6.3. 


Solution of Exercise 6.3 





The file MRA_schauder.m provides the procedure related to decomposition 
and reconstruction algorithm (6.25) and (6.26). A flag parameter allows one 
to switch from decomposition ([cy] — [d,z]) to reconstruction ([dy] — |cz]) 
formulas. 
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The file MRA_schauder_ex1.m provides the procedure that generates a sam- 
pling of a function on the interval [0,1] (for this particular example, the func- 
tion contained in the file MRA_function.m is defined by f(x) = exp(—2) sin 472). 
This sampling is then used to define a piecewise constant function with the 
help of the procedure MRA_pwcte. Decomposition and reconstruction compu- 
tations are then performed by the procedure MRA_schauder. 





Solution of Exercise 6.4 


The file MRA_schauder_ex2.m provides the procedure for performing the same 
computations as MRA_schauder_ex1, with the difference that all coefficients 
whose absolute values are smaller than the threshold are set to zero. This 
procedure is used for the compression tests of Table 6.2 and Fig. 6.8. 


Solution of Exercise 6.5 


The file MRA_daube4.m provides the procedure related to decomposition and 
reconstruction algorithm (6.29) and (6.30). A flag parameter allows one to 
switch from decomposition ([cy] —> [dy]) to reconstruction ({dz;| — [c,;]) 
formulas. 

The file MRA_daube4_ex1.m provides the procedure that generates a sampling 
of a function on the interval [0,1] (for this particular example, the function 
contained in file MRA_function.m is defined by f(x) = exp(—x) sin 47x). This 
sampling is then used to define a piecewise constant function with the help 
of the procedure MRA_pwcte. Decomposition and reconstruction computations 
are then performed by the procedure MRA_daube4. 





Solution of Exercise 6.6 


The file MRA_daube4_ex2.m provides the procedure for performing the same 
computations as MRA_daube4_ex1, with the difference that all coefficients 
whose absolute values are smaller than the threshold are set to zero. This 
procedure is used for the compression tests of Table 6.3 and Fig. 6.10. 


Solution of Exercise 6.7 


The files MRA_haar_ex3.m, MRA_schauder_ex3.m, and MRA_daube4_ex3.m 
provide procedures that read an image in the file lenna. jpg and then per- 
form decomposition, compression, and reconstruction steps with the Haar 
wavelet (respectively Schauder and Daubechies wavelets). Figs. 6.11 and 6.12 
were obtained in this way. Note that in these procedures, decomposition and 
reconstruction are performed by successive uses of one-dimensional decompo- 
sition and the reconstruction algorithm. 
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Elasticity: Elastic Deformation of a Thin Plate 


Project Summary 


Level of difficulty: 2 





Keywords: Finite difference method, Laplacian, bilaplacian 


Application fields: Linear elasticity: deformation of a membrane or plate 


7.1 Introduction 


We study in this chapter the deformation of a thin plate. In our example, 
the plate is part of a condenser microphone, such as one may find inside a 
telephone (or a cellular phone). When the user speaks, the plate (which is 
in fact a metalized plastic diaphragm) moves in response to changes in the 
acoustic pressure induced by sound waves. Since the plate is also the side of 
an electric capacitor, its dynamic deformations infer variations of the electric 
potential, which is amplified to generate a measurable signal. For the sake 
of simplicity, we shall consider here a thin rectangular plate in the device 
displayed in Fig. 7.1. 


thin plate (or membrane) 


bottom B 
air , hole 


Fig. 7.1. Sketch of the pressure sensor (side view). 








side support 
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7.2 Modeling Elastic Deformations (Linear Problem) 





As a first stage of approximation, we shall neglect electrostatic forces in the 
device and consider that the plate bends exclusively because of the difference 
between the inside and outside values of the acoustic pressure (see Fig. 7.2). 
The pressure is assumed constant inside the device, and we take into account 
only variations of the outside acoustic pressure. There are two physical models 
relating the deformation fa to the pressure value P,: 





e for a high-strained plate, 
—c1 Afa = Pas (7.1) 


e and for a low-strained plate (the term “membrane” is then more appro- 
priate than “plate” ), 


GA, =P): (7.2) 


The coefficients cı and © are physical constants depending on the material 


and defined as 
Ee? 


Oa 
where e is the thickness of the plate, T the mechanical stress, Æ the Young 
modulus, and v the Poisson coefficient. 


cy =T and oc 


Pa 






h = d-f(x,y) 

















Fig. 7.2. Deformation of the plate. 


In previous equations the symbol A denotes the Laplacian, a differential 
operator defined in two dimensions as 


Eo eee 
_ Or? Oy? 








A fa 


The bilaplacian (or biharmonic operator) A? is defined accordingly as A? f, = 
A(Afa). 
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In order to have a general formulation of the problem we shall consider in 
the following the “mixed” equation 


col? fa —a Af, = Ph. (7.3) 


This is a partial differential equation (PDE) of fourth order. For any physically 
acceptable value of P,, there exists a solution f, to equation (7.3) (for math- 
ematical details see Ciarlet (1978, 2000)). In fact, the solution is not unique, 
since for any harmonic function fp (i.e., a function such that Af, = 0), fat fn 
is also a solution. This is a direct consequence of the linearity of the Laplacian 
and bilaplacian. ! 

To ensure uniqueness (which is a crucial feature for the success of a nu- 
merical computation) we shall prescribe appropriate boundary conditions. We 
want the solution satisfying a realistic condition: the plate is assumed to be 
fastened along the four sides of the rectangle. This means that the deformation 
fa is null all along the rim: 


falan = 0, (7.4) 


where N2 denotes the boundary of the domain N covered by the plate. This 
is called a Dirichlet homogeneous boundary condition and is a sufficient con- 
dition to obtain the uniqueness of the solution of equation (7.1), because 
the Laplacian is a second-order differential operator (for the proof, see, for 
example, Ciarlet (2000)). When considering equation (7.2) or (7.3), a supple- 
mentary boundary condition is required, since the bilaplacian is a fourth-order 
differential operator. Denoting by n the outward normal vector? to 0, this 
supplementary condition simulates the elastic “clamping” of the plate along 
all the boundary: 


Ofa 
On lao 





= 0. (7.5) 


The boundary condition (7.5) is referred to mathematically as a Neumann 
condition. 


7.3 Modeling Electrostatic Forces (Nonlinear Problem) 


Relax the pressure for a while, and have a new look at Fig. 7.1. The plate 
and the bottom of the cavity are both made of metallic material and form 
the two parts of a capacitor whose dielectric is the air within the cavity. A 
dielectric material is a substance that is poor conductor of electricity, but 


1 A(fa + fa) = Afa + Afr and consequently A’ (fa + fr) = A* fa + A’ fh- 
The normal vector, often simply called the “normal”, to a surface is a vector 
perpendicular to it. 
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an efficient support of electrostatic fields. So, when both parts of the capaci- 
tor have different electric potential values, there exists a force bringing them 
closer. 

We start with basic relationships expressing the electrostatic energy W 
and force F: 

w = Loy? = EU r= W ee 
2 2h dh 2h? 
as functions of the capacitance C, the potential difference U, the air permit- 
tivity €, the surface S of the plate, and the capacitor thickness h (i.e., the 
distance between the top and bottom plates; see Fig. 7.2). The electrostatic 
pressure acting on the plate is then obtained as 
Po U” 
D ee 

As with the acoustic pressure, the effect of the electrostatic pressure is to 
bend the plate. Consequently, the resulting deformation fe is the solution of 
an equation similar to (7.3), with modified right-hand side P.. The difference 
between the two cases is that the electrostatic pressure Pe is no longer a 
constant, like P,, but depends on the position (x,y) since (see Fig. 7.2) h = 
d — f(z, y). 

In conclusion, the mathematical model taking into account the electro- 
static forces consists of the following nonlinear PDE: 


EU? 
2(d = Jaa, He | 


with Dirichlet and Neumann boundary conditions (which make the solution 
unique) 








co A? fe = aAfe = Pate) z (7.6) 


Ofe 


=0 on ON. (7.7) 
On 


fe =0 and 





7.4 Numerical Discretization of the Problem 


In this section we shall not worry about the right-hand side of the model 
equation (7.3) or (7.6) and discuss only the discretization of the differential 
operators (Laplacian and bilaplacian). We consider, for example, the equation 
(7.3) with boundary conditions (7.7). 

Since it is not generally possible to obtain an exact (analytical) form of 
the solution f,, we shall compute an approximate solution on a regular mesh 
representing the rectangular plate 2 = [0, Le] x |0,L,] (see Fig. 7.3). We use 
the notation M; ; for the grid point of coordinates (x;,y;), with 


C= ths el miel. Ne Le (met), (7.8) 
gas j=0,... my el, dea Ly/(my+1). (7.9) 
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Note that we have to compute only mx : my discrete values fi; (i = 
1,...,mx, j = 1,..., my), approximating the values fa(Mi;) = fa(xi,y;) 
of the exact solution, because the values on the boundaries are known (i.e., 
foj = fmx+1,5 = fio = fimy+1 = 0). These values are obtained by approx- 
imating the differential operators in (7.3) using the finite difference method 
(see Chap. 1, or for more details Strikwerda (1989)). 

































































0 1 2 i mx 


Fig. 7.3. A regular mesh (or grid). 


To begin with, we address the case of the Laplacian. The easiest way to 
approximate second derivatives in this differential operator is to use centered 
differences, leading to the well-known 5-point (difference) scheme: 








1 
—(A5f )i,5 = za (it RD) 


(7.10) 


1 
+ pe (ii Alias oi 
y 


This scheme is second-order accurate at any point of the grid, that is, 
—(Asf)i; = —(Af)i; + O(h2,), with hm = max(hz, hy). 
We proceed in the same manner to discretize the bilaplacian 42. We first 
substitute in equation (7.10) all fi; by —(Asf);,;: 


(Anfas A arg HAAST lage (Asf hii) 


+ Fls ig +2(As5 flig — (A5f)i,5-1). 
y 
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Gj+1) 


(i+1,j) 





Fig. 7.4. Discretization of the two-dimensional Laplacian with a 5-point scheme. 


Then, by inserting the expression of —(As5 f);,; according to (7.10), we obtain 
the so-called 13-point scheme (see Fig. 7.5), which is also an approximation 
of second order: 


1 
Ah pa ii 2 
y 


PE 4 AN 2, 


1 4 4 
+ pa i2 = o + a) fi-1,j 





8 6 6 
4 4 1 
= Ce T a) RES pairs 
2 4 4 2 
T papz di-1541 = (saya T = Lega Te papa dit15+1 
ely £ Y y T'Y 


1 
er tO 
hy 


Finally, the discrete form of our PDE reads 
(Ass — 1 (Asf)ig = PM) = Pij (7.12) 


where P stands for either acoustic or electrostatic pressure. These equations, 
written for any grid point M; j, with i = 1,2,...,mx and j = 1,2,...,my, 
form a linear system whose unknowns are the mx - my values fij. It is not 
difficult to observe that the discretization (7.11) is not well posed for grid 
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(i,j+2) 


Eu 7 i k 


i 


G-1,j-1) Gj-1) (i+1,j—-1) 





(1,j-2) 


Fig. 7.5. Discretization of the two-dimensional bilaplacian with a 13-point scheme. 


points near the boundaries, since it involves “ghost” points that do not exist 
(for example, the equation for i = 1 and any j requires the value of f_1,;, 
which is not defined). We are fortunately rescued from this critical situation 
by the (Neumann) boundary condition on the normal derivative (7.5), which 
allows us to define such ghost points. Indeed, the derivative 0 f/On can be 
discretized by the first-order backward finite differences 


(hg Sit) he = and. Ce he 0; 


for any point M; ; located on the boundary (i.e., i = 0 or mx + 1 and j = 0 
or my + 1). Keeping in mind that fij = 0 in any point Mi j, following the 
(Dirichlet) boundary condition (7.5), we deduce that f;,; = 0 for any ghost 
point. 


Remark 7.1. We can use a simple programming trick when computing the 
(mx : my)? matrix of the linear system (7.12). For the rows corresponding to 
i = 1 or j = 1 (and similarly, for à = mx or j = my), we simply set the 
nonexistent (ghost) values to zero (i.e., ifi < 1 or j < 1 and similarly for 
i > mx or j > my). The right-hand side of the system is thus not affected. 


Remark 7.2. Since one should expect from the discussion on the uniqueness 
of the solution of the continuous PDE (7.3), implementing discrete boundary 
conditions results in rendering the matrix of the linear system (7.12) invertible. 


7.5 Programming Tips 


7.5.1 Modular Programming 


In the previous section, different logical steps have been pointed out when 
implementing the finite difference method: 





1. definition of the coordinates (x;,y;) of any mesh point, 
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. construction of the linear system (7.12), 
. introduction of the boundary conditions, 
. solving the linear system, 

. visualization of the results. 


oe W N 


Any scientific package has to deal separately with each item of this list by 
associating a specialized computing procedure. These procedures are called 
modules. Results (outputs) of a given step module are data (inputs) for the 
next step module. Several modules may exist for the same logical step; in 
this case they all have to share similar formatted input and provide similar 
formatted output. 

For the present study, we shall also proceed step by step, by setting up 
progressively the numerical operators. First, we neglect the effect of the elec- 
trostatic pressure in order to check the programs required to solve the linear 
problem (7.12). Then we shall deal with the nonlinear problem by iterating 
on successive linear problems. 





7.5.2 Program Validation 


Some questions can be asked when one uses a numerical approximation to 
solve a problem such as (7.3). Are we sure the good solution is computed 
with an effective procedure? How many points are required in order to get an 
accurate numerical solution? ‘There is a simple way to answer these questions: 
it consists in solving a problem for which an exact solution is known, and then 
comparing the computed result to the exact one. 

Let us consider, for example, the Laplace equation (7.1). We may choose 
a more or less complicated solution of the PDE, as for example 


fa(x,y) = 100sin(3.772) sin(5.47y) + (3.72 — 5.4y), (7.13) 
and calculate the corresponding right-hand side (considering cı = 1): 
P,(x,y) = —A falz, y) = 100(3.7? + 5.42) x? sin(3.7nx) sin(5.47y).(7.14) 


A program solving the PDE —Af, = Pa, with boundary conditions 
fajag = g(x,y), needs two inputs: P,(x,y) and g(x,y). Inserting in the pro- 
gram (as discrete input data) the expression (7.14) for P,(x,y) and (7.13) 
for g(x,y), we should obtain numerical values f;,; close to the exact values 
falzi, yj) at the same grid points (2;,y;). We are now able to compare the 
two solutions (exact and numerical) qualitatively by plotting the results in 
the same graphical window and quantitatively by computing, for example, 
the following relative approximation error: 


Dij Jus) fi 
ne falti, yA 
This error has to be “reasonably” small and to diminish when the number of 


grid points is increased. If this is not the case, the program must be checked. 
The same validation procedure can be used for the PDEs (7.2) and (7.3). 


1/2 








(7.15) 


Error = 
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7.6 Solving the Linear Problem 





In order to solve the linear problem (7.12) we note that 


1. n = mz - my, is the total number of grid points, 

2. Ah; is the matrix associated with the 5-point scheme (including boundary 
conditions), 

3. Ah13 is the matrix associated with the 13-point scheme (including bound- 
ary conditions), 

4. bs, b13 are the corresponding right-hand sides. 


Exercise 7.1. 1. Write a program generating all the coordinates (x;,y;) of 
the grid points M; j, for à = 1,2,...,ma and j = 1,2,..., my. 

2. Write a program computing the matrix Ahs and the corresponding right- 
hand side bs, in order to solve equation (7.1). 

3. Write a program computing the matrix Ahı3 and the corresponding right- 
hand side b13, in order to solve equation (7.2). 

4. Write a program computing the matrix Ah and the corresponding right- 
hand side b, in order to solve the complete problem (7.3), including the 
boundary conditions. Solve this problem for a given value of the pressure. 

5. Visualize the results. 

N.B. All the programs must be checked using the validation procedure 
described above. 


A solution of this exercise and related procedures are described in Sect. 
7.8 at page 162. We show here (see Fig. 7.6) a plot of the numerical solution 
obtained from validating the program solving the linear problem (7.3). The 
right-hand side of the PDE was calculated such that (7.13) becomes the exact 
solution. Even though a coarse mesh was used (nx = 20 and ny = 30), the 
numerical solution is very close to the exact one. The relative error (7.15) for 
this numerical experiment has the value Error = 0.0375. This error diminishes 
as the number of grid points increases (Error = 0.0095 for na = 40 and 
ny = 60), but the computing time is considerably larger for this last run! 


7.7 Solving the Nonlinear Problem 
We now address the nonlinear problem, corresponding to a more realistic case 


when the plate is part of a microphone and subject to both acoustic and 
electrostatic pressures. 


7.7.1 A Fixed-Point Algorithm 
The resulting nonlinear problem is 


GA f= aåf =P Pf), (7.16) 
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Computed solution 


200 


100 


—100 


—200 





—300: 
1 








0 0 


Fig. 7.6. Numerical solution obtained in validating the program solving the linear 
problem (7.3) by “imposing” the exact solution (7.13). 


with Dirichlet and Neumann boundary conditions 


Of 


f=0 and An 


=0 on OM. (7.17) 
To solve this problem we use a fixed-point algorithm. We start by solving 
the linear problem (7.3) corresponding to P, = 0; we denote by fo this so- 
lution. We define then the sequence {fk }ken of solutions of successive linear 
problems: 








GA fini = CyA fea = PPG): (7.18) 





Since the fixed-point algorithm is an iterative method, we have to choose a 
stopping criterion to decide whether a solution fg is accurate enough to be 
a good approximation of the exact solution. A classical criterion is based on 
the relative variation of the approximate solution fy: 


max | 0) = Je) < £ max | ets, 9) |, (7.19) 
T,Y L,Y 


where € is the convergence threshold. 


7.7.2 Numerical Solution 


Once again, we shall first consider a test problem before solving (7.16) in order 
to validate the procedures. We choose the same test solution f as previously 
(7.13) and compute the corresponding right-hand side. For this case, we also 
have to impose the expression of the nonlinear term P.(f), depending on the 
solution. For example, we can use the function defined by 





100 


P.(f) = (200 — fy? 
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For this choice and for the same grid (nz = 20 and ny = 30), the solution 
converges after only two iterations of the fixed-point algorithm (the conver- 
gence threshold is fixed to e = 0.001). We check that the plot of the solution is 
similar to that displayed in Fig. 7.6. More details on the solution procedures 
can be found in Sect. 7.8 at page 162. 

We consider now a more realistic choice of the values of the physical pa- 
rameters’ appearing in problem (7.16). The atmospheric pressure value is set 
to 10° [Pa] or [N/m?], the acoustic pressure P, then goes from 107° [Pa] to 
10 [Pa]. It is established that the human ear can perceive pressure variations 
from 2 x 10~° [Pa] up to 2 [Pa]. We may choose, without any damage to one’s 
hearing or numerical procedures, the value of 1 [Pa] for the acoustic pressure 
variation. 

We assume that the plate is made of silicon; for such a material the Young 
modulus is Æ = 1.3-10'! [Pa] and the Poisson coefficient is v = 0.25. The 
device displayed in Fig. 7.1 has the following characteristic dimensions: length 
1 [mm], width 1 [mm], and thickness e = 1 [um]. The mechanical stress of 
the plate is T = 100 [N/m]. Concerning the capacitor, the thickness (without 
pressure variations) is d = 5 [um]. The dry-air permittivity is € = 8.85-107~" 
[F/m], and the polarization potential is V = 25 [V]. The mesh of the plate, 
as displayed in Fig. 7.3, has nx = 20 and ny = 30 grid points, resulting in a 
total number of 600 discretization points, and the same number of unknowns. 





Exercise 7.2. 1. Modify the procedure used to solve Exercise 7.1 in order 
to use the above realistic data for the linear problem (7.3). 
2. Write a program implementing the fixed-point algorithm. 
3. Solve the nonlinear problem (7.16). 
4. Visualize the results. 


A solution of this exercise is proposed in Sect. 7.8 at page 162. 


Hint: We first solve the acoustic problem (7.3) with boundary conditions 
(7.7) and obtain a deformation as plotted in Fig. 7.7. The maximum defor- 
mation value is located at the center of the plate (max fa = 0.080 [um]). 

In the next step, we consider the complete problem (7.16) including bound- 
ary conditions (7.17). The fixed-point algorithm will converge within three 
iterations when we use the stopping criterion (7.19) with € = 0.001. The max- 
imum deformation of the plate (see Fig. 7.7) is reached again at the center of 
the plate (max fe = 0.077 [um|). 





Remark 7.3. It is important to note that relative values of the acoustic pres- 
sure P, and the polarization potential V were chosen in order to respect the 
constraint max fa < d (see Fig. 7.2). 


3 As physical units, we use [Pa] = Pascal, [N] = Newton, |m] = meter, [mm] = 
millimeter, [um] = micrometer [F] = Faraday, |V] = Volt. 
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Solution for the x problem Solution for the pet problem 
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Fig. 7.7. Numerical solution of the realistic problem. (a) linear; (b) nonlinear. 


7.8 Solutions and Programs 


Solution of Exercise 7.1 


The file ELA S_plate_ex.m contains the procedure that solves the problem (7.3) 

and computes the test solution defined in ELAS_solution.m. This procedure 

calls the functions written in the files ELAS_lap_matriz.m and ELAS _lap_rhs.m, 
which compute the linear system obtained from the discretization of the 

equation (7.1). Similarly, procedures in the files ELAS bilap matrix.m and 

ELAS_bilap_rhs.m compute the linear system corresponding to equation (7.2). 

The computed test solution is plotted in Fig. 7.6. 





Solution of Exercise 7.2 


The main program solving the nonlinear problem (7.6) with realistic coeffi- 
cients is written in the file ELAS_microphone_ex.m. It also contains the fixed- 
point algorithm. Functions in the files ELAS lap_matrix.m and ELAS_lap_rhs.m 
(for the Laplacian part of the PDE) and in the files ELAS_bilap_matrix.m and 
ELAS_bilap_rhs.m (for the bilaplacian part of the PDE) are used as previously. 

The new procedure ELAS_pressure defines the nonlinear term. The obtained 
numerical solutions are displayed in Fig. 7.7. 


7.8.1 Further Comments 


In this section we address the important point of the construction of the 
matrices arising from the approximation of the operators. The use of the rect- 
angular mesh (displayed in Fig. 7.3), with a lexical ordering of the nodes, 4 
added to the 5-point scheme approximation of the Laplacian (see Fig. 7.4), 


4 We order the nodes starting from the bottom line, from left to right 1,2,..., Mz: 
then we continue with the line just above, from left to right mx + 1,mx + 
2,...,2mx, and so on. 
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lead altogether to a very particular pattern of the matrix Ahs, displayed in 
Fig. 7.8. Such a matrix is called banded, of bandwidth 2mzx + 1, because its 
coefficients satisfy the relationship (Ah;5);; = 0 if |i — j| > ma. Note that this 
matrix is sparse because it contains only 5ma-my nonzero coefficients. For 
the same reasons, the matrix associated with the bilaplacian (see Fig. 7.9) is 
banded of bandwidth 4mzx +1, and sparse with 13mx-my nonzero coefficients. 
These properties are useful for reducing storage, because increasing the num- 
ber of unknowns leads to huge use of memory. Scientific programs have to deal 
carefully with these properties; thankfully MATLAB is a user-friendly envi- 
ronment and provides very simple ways to build such matrices. For example, 
the following procedure (ELAS_lap_matrix) computes the matrix Ahs: 


n=nx*ny ; 
h2x=hx*hx ; h2y=hy*hy ; 
Ah5=sparse(n,n) ; 
Dx=toeplitz( [2.40 -1.d0 zeros(1,nx-2) ] ) ; 
Dx = Dx / h2x ; 
Dy=eye(nx,nx) ; 
Dy = - Dy / h2y ; 
Dx = Dx - 2.d0 * Dy ; 
for k=1:(ny-1) 
i=(k-1)*nx ; j=k*nx ; 


Ah5( (iti) : Citnmx) , Citi) : Citnx) ) = Dx ; 
ABS C (j+1) : (j+tnx) , (i+1) : (i+nx) ) = Dy ; 
Ah5( (i+1) : (i+nx) , (j+1) : (j+nx) ) = Dy à 


end ; 
i=(ny-1)*nx ; 
Ah5( (iti) : (i+nx) , (itt) : (i+nx) ) = Dx ; 


This program calls MATLAB built-in functions: 


1. sparse is used to declare a low-storage sparse matrix; 

2. toeplitz is used to define a Toeplitz matrix; here Dx is an ng x nx sym- 
metric tridiagonal matrix whose entries are (Dx);; = 2 and (Dx);;-1 = 
—1; 

3. eye is used to define the nx x nx identity matrix; 

4. then, the nonzero coefficients of Ah; are defined “block by block”, using 
Dz to set diagonal blocks, respectively Dy for off-diagonal blocks. 


Unfortunately, such a structure occurs in a very particular case, strongly 
depending on the geometry: for a nonrectangular mesh or with a random 
ordering of the nodes, the resulting matrix has a less-regular pattern (see for 
instance Chap. 11). Nevertheless, it remains sparse because this property is 
related only to the approximation scheme. 
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Fig. 7.8. Matrix Ahs. Fig. 7.9. Matrix Ah:13. 
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Domain Decomposition Using a Schwarz 
Method 


Project Summary 


Level of difficulty: 2 


Keywords: Domain decomposition, Schwarz method with overlap- 
ping, Laplacian discretization, 1D and 2D finite differ- 
ence 


Application fields: ‘Thermal analysis, steady heat equation 


8.1 Principle and Application Field of Domain 
Decomposition 


Realistic modeling of physical problems often involves systems of partial dif- 
ferential equations (PDE), usually nonlinear, and defined on domains that 
can have both large size and complex shape. In most cases, the selected nu- 
merical method requires that one discretize the domain, and the number of 
degrees of freedom can easily be more than what the available computer will 
handle. Modeling of the air flow around an aircraft with 3D finite elements 
requires, for instance, the discretization of the surrounding domain with a 
few million points, with several unknowns to determine at each point. The 
numerical scheme can furthermore be implicit and hence involve a linear- 
systems solution with this impressive number of unknowns. If we do not have 
a supercomputer at hand, which only very specialized research centers do, the 
matrix of such a system cannot even fit within the memory of the computer. 

A simple answer to this technological lock is to subdivide the problem 
into smaller ones, that is, to compute the solution piecewise, as the solution 
of problems defined on subdomains of the initial one. Eventually, the global 
solution is the patch of all the solutions of partial problems. 
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This method can also simplify the solution of problems set originally on 
complexly shaped domains, by selecting a decomposition in which each sub- 
domain has a simpler, elementary shape, making the local solution simpler 
to compute. Another possible extension is the coupling of equations in order 
to treat interactions between two different physical phenomena defined on 
neighboring domains, fluid structure interaction, for instance. 

The main difficulty arising in adopting this method is the definition of 
boundary conditions on each subdomain. Actually, the internal boundaries, 
in contrast to boundaries of the global domain, are fictitious, and the physics 
of the problem doesn’t provide boundary conditions. Two strategies, both iter- 
ative, can be adopted: The first one consists in doing a domain decomposition 
with partial overlapping of the subdomains, and using the previous iteration 
solution on neighboring subdomains to define the boundary conditions on the 
current subdomain. The second strategy consists in partitioning the global 
domain into nonoverlapping subdomains and imposing continuity conditions 
at the interfaces. 

Once the decomposition strategy is selected, the solution method on each 
subdomain is the same as on the global domain, with now a reasonably small 
number of unknowns. To fix the ideas, consider the simple example of a scalar 
equation solved by finite differences on a structured mesh that can be decom- 
posed into P subdomains of the same size N. The numerical treatment will 
require solving P linear systems costing O(N?) on each subdomain, that is, a 
total cost of O(PN?) per iteration, instead of O(P? N°) for the global prob- 
lem. The added cost of the new method comes from the iterative nature of 
the algorithm, and therefore the size and the number of the subdomains must 
be carefully selected in order to ensure the competitiveness of the algorithm. 

In any case, even if the computing time increases compared to the initial 
global scheme, we always gain the crucial advantage of being able to fit the 
problem in the computer’s memory. 

Last but not least, even though this advantage cannot be illustrated within 
the scope of a MATLAB project, domain decomposition methods have really 
found their full worthiness with the development of parallel computing (see 
for instance Smith, Bjørstad, and Gropp (1996)). The solution of problems set 
on the subdomains can be distributed to different processors, with a serious 
hope of computing-time speedup as well as memory savings. 

To illustrate the principles of the method, this project proposes an imple- 
mentation of the Schwarz method with overlapping on model problems of the 
1D and 2D Laplacian 


— Au(x) + c(x)u(x) = f(x), for ze ND CR, 
H =g, on ø. 








(8.1) 


8.2 One-Dimensional Finite Difference Solution 


In one dimension, the above problem becomes 
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—u” (x) + c(x)u(x) = f(x), for x€ (a,b), 
ula) = ta; (8.2) 
u(b) = up. 


This problem models, for instance, the bending of a beam of length b — a, of 
constant section and of stiffness coefficient c(x), pulled across its longitudinal 
axis and submitted to a transverse charge f(x)dx. The finite difference method 
for second-order boundary value problems such as the one above is described 
in detail in Lucquin and Pironneau (1996), and we just summarize the main 
features here. We discretize the interval [a,b] on n + 2 points x; = a+ th for 
i = 0,...,n + 1 with a uniform step h = T We denote by U the vector 
formed by the approximation of the solution u(x) at points x;. We set Up = Ua 
and U,11 = uy, to ensure that the numerical solution satisfies the boundary 
conditions. The finite difference discretization of the second derivative (as 
described, for instance, in Chap. 1) leads to the linear system 








(5) ArU = Ba, 


where A, is the n x n tridiagonal matrix 


D h?c —] 0 ene eee 0 
= 2 + h?co 
1 way 2 
: : dE he —] 
0 a sa à Ú si 2+ he, 


where c; = c(x;) and Bp is the following vector in R”: 


f(ath) +55 


f(a+ 2h) 
h = 2 
fb—h) +55 


8.3 Schwarz Method in One Dimension 


For simplicity, we first decompose the computational domain [a,b] into two 
subdomains with overlapping: we choose an odd value n and two integer values 
i and 2, symmetric with respect to not such that ù < not < tr. We set 
x, = ih and x, = iph, thus defining two intervals |a,x,[ and |x,,b| with a 
nonempty overlap |a, xr] O [7,6] = [x1, £r] Æ 0. We now plan to compute the 
solution u of the problem (8.2) by solving two problems set on the subintervals 


la, £y] and [xy, b]: 
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AA 


~u] (x) + c(x)ui (x) = f(x), for xe fa,a,|, 
(P1)4 wa) = ee 
Ui (Lr) = 
“ile + de Jua(x) = f(x), for a € Ja, df, 
and (P2)4 u(x) = 


( 
u2(b) = 


The solution u (respectively u2) is expected to be the restriction on the 
interval |a, x,| (respectively [x,, b]) of the solution u of the problem set on the 
full interval [a,b]. The two solutions u; and ug must therefore be identical 
within the overlapping region |2;,x,|, which allows us to define the boundary 
conditions in x; and zpr: 





uilzr) =Q = uz(£r) and dm) == (zı). 


Since we do not know a priori the values of a and 8, we solve the two problems 
iteratively: œ is fixed arbitrarily, at first, for instance, by linear interpolation 
of the global boundary conditions 


1 


a= z (ua(b — zr) + Up(Lp — à)). 





Then we set u$(x,) = a and we compute for k = 1,2,... the solutions uf and 
us of the following problems: 


-ui (2) + e(w)ui(z) = f(x), for x € (a,2,), 
(P1)4 uila) = Ua, 
Uy (Tr = us (hd 


—uy (x) + c(xjuz(x) = f(x), for x € (zx, b), 
(P2)4 ua(ar) = uï (z), 
u2(b) = Up. 


We claim that when the overlap region is nonempty, this algorithm converges 
to the solution u of the global problem (8.2) as k — œ. This result was first 
proved using fixed point theorem by Schwarz (1870), in the case c(x) = 0. It 
was rediscovered one century later using a variational formulation approach 
by Lions (1988). More efficient methods from the algorithmic point of view 
have been developped since then, which do not require overlap but impose ad- 
ditionnal transfer conditions between subdomains. The following paragraphs 
will illustrate numerically the convergence, after discretization of (P1) and 
(P2) by finite difference. 





8.3.1 Discretization 


The problems P; and P, are solved using finite differences, in the same manner 
as used for the global problem (8.2) in the previous section. The bounds x, and 
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x, have been set so that the two subdomains are of the same size. Denoting 
by VF (respectively W") the vector of the approximate discrete solution on 
the subdomain |a, xp] (respectively |2;, b|) the algorithm for the kth iteration 
is as follows: 


initialization : wW? =a 
for AS lye do 
1 = 
AniV" = Brat z2 (Ua, 0,...,0, wey’, (8.3) 
1 
Anr W" = Bho elf ra Vai Dasesoe Ue], 
end 


where Ap; (respectively An r) is the discretization matrix for the operator 
— À + cI on Ja, x,| (respectively |), bf) in Rît x Rint: 


D hrc —] 0 ee +. 0 
=| 2+h2c. —1 0 


1 0 
Ahi = 75 
h ; 0 
‘ . 0 —1 2+ hc; 9 —] 
0 ax wi © = D hé 
and 
2 + hei 11 —] 0 “in ie 0 
—] 2 + h?c;, 0 —] 0 
Apr = h2 , 
; 0 
; 0 —1 2+h?c,_1 —] 
0 Te TE —1 2+ h2cp 


The vectors B; and B, contain the values of the right-hand-side term f eval- 
uated at the discretization points 


(Bri), = f(x), for i=1,...,4,—1, 
(Big), = fee); tor i=1,...,n— i. 
The stopping criterion in the iteration loop is obtained by measuring the gap 


between the two solutions within the overlap region {x,,x,|, where they should 
coincide in the limit: 


le"|<e, with ef VS = Ws for Hell. 


—À] = i—i’? 
In order to test the performance of the method, we can also compare the 


results with the solution U obtained in Sect. 8.2 using the classical method 
on the whole domain. 
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We accordingly compute two vectors ef and e* of components 
(ef), =U,;-V,*, for i=1,...,i —1, 
(er), =Ui —W; for i=i+1,...,n, 


r i—i’? 





and we observe the decay of their norm with iteration along with the e, norm 
decay. 


Exercise 8.1. Write a program to implement Algorithm 8.3. Display in the 
same graphics window but with different colors the solutions VF, WF, and U, 
refreshing the graphics at each iteration k. One should obtain a sequence of 
graphs as in Fig. 8.2. Represent the evolution of the three errors |e“||, Jef], 
and Je*| as functions of k in another graphics window as in Fig. 8.1. 

A solution of this exercise is proposed in Sect. 8.5 at page 181. 


Exercise 8.2. Modify the program of Exercise 8.1 and turn it into a function 
receiving as input argument the number of points no = ir —iı— [1 in the overlap 
region. This function computes and returns as output arguments the number 
of iterations necessary to reach the tolerance error and the computing time. 
Write a program that calls this function for varying values of nọ and analyze 
the influence of the size of the overlap region on the algorithm’s convergence. 
A solution of this exercise is proposed in Sect. 8.5 at page 182. 


Figures 8.1 and 8.2 illustrate the results for a beam of length 1 meter, with 
a constant stiffness coefficient c = 10. The left end of the beam is fixed equal 
to 10 cm higher than the right end. The beam is subject to its own weight of 
1 N/m as well as to an overload of 9 N/m on a 40 cm portion, starting 20 cm 
from its left end. 


Error evolution 
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Fig. 8.1. Logarithm of the L° norm of the error versus the number of iterations. 
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Right hand side 


Iteration number 1 Iteration number 3 
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Fig. 8.2. Right-hand side f(x), global, and local solutions for varying numbers of 
iterations. 


8.4 Extension to the Two-Dimensional Case 


We now focus on the 2D problem (8.1) set on a rectangle. We restrict ourselves 
to the case c = 0, thus modeling a steady heat conduction 3D problem in a 
metallic piece where one dimension is much larger than the two others (see 
Fig. 8.3). Variations in the temperature will be neglected in this direction. In 
the first case, we impose on the boundary inhomogeneous Dirichlet conditions: 





ee = F(x1,%2), for (x1,%2) € Jay, b1|x]aa, bəl, 

= f(x), for ZE Jaz, bal, 

Hh Ta) = g2(x2), for x2 € Jag, bal, (8.4) 
= fi(æ1), for x € Jaz, by |, 

n = gı(zı), for x € Jaz, by |. 


From a practical point of view this computation models, for instance, a ther- 
mal shock test on a metallic beam. An experimental setup can consist, 
for instance, of a null temperature on faces 7; = a, and x; = 04, that is, 
fix) = g2(x2) = 0, a temperature of 50°C on the face x2 = ag, that 
is, f1(41) = 50, and a temperature of 100°C on the face x2 = bo, that is, 
gi(x1) = 100. Furthermore, since no internal heat sources are present, the 
right-hand side is null: F(x1,x%2) = 0. 


8.4.1 Finite Difference Solution 


We restrict ourselves to the case in which the domain dimensions are such that 
it is possible to use the same discretization step in both directions x; and 79. 
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Fig. 8.3. Sketch of the metallic piece subject to a thermal shock. 


We therefore define h = Dior = tate a2 





. On this regular grid (see Fig. 8.4), w 

denote by ui; = u(ai + ih, a2 + jh) (respectively fij ) the discretized ne 
of the solution u(21, x2) aeapeciively F(a1,%2)) at the discretization points. 
In that case, using Taylor expansions in x; and in x2 in order to approximate 
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Fig. 8.4. Discretization of the domain interior {2 using nı X nz points. 


partial derivatives in both directions, we obtain the Laplacian discretization 
with a five-point scheme, 


Ai j — Ui—1,j — Uit, j — Ui,j—1 — Ui j+ 
Ana a a E E A 


h? | 


which is an O(h?) approximation if u is smooth enough (i.e., u € Ct). The 
finite difference solution W is hence a solution of the following linear system: 


Aly j — Ui—1,j — Wiig — Ui,j—1 


— Ui,j+1 
h2 n =f; 


oe (8.5) 
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where the unknowns are the u; j for à = 1,...,n1 and j = 1,..., n2. In fact, 
the values of the solution on the boundaries, corresponding to indices 7 = 0 
or i = nı +1 and j = 0 or 7 = n2 + 1, are set by the boundary conditions 


uo j = falaz + jh), Un+1,j = 92(a2 + jh), 
Wo = Sa FR Gr = gila EU): 


Each row of the linear system (8.5) has at most five nonzero terms: the diag- 
onal term coefficient is ee and for the off-diagonal terms, corresponding to 
the neighbors with indices 


CELI; (G—1,5), (,j—1), and (,j+1), 


which do not belong to the boundary, the coefficients are equal to =e, These 
nodes are represented by squares in Fig. 8.4. 

We can build the matrix using block matrix symbolism: the degrees of freedom 
(i,j), @+1,7), and (¢—1,7) are neighbors in the grid and also consecutive in 
the global numbering of the degrees of freedom. The coefficients of the linear 
system that link nodes belonging to a given row j, for j = 1,...,n2, can 
therefore be rewritten as a tridiagonal matrix 








4 2 2) sso: aie 0 
_1 4 -1 
1 0 4 
Taes DE (8.6) 
h2 0 
Oo cise clas 40- st <2 


The two other neighbors of the node (i,j) in the grid are the nodes (i, 7 — 
1), (i,j +1), which are nı nodes away on each side of the central node in the 
global numbering, and their connection is ensured through a diagonal matrix 
D= i on each side of the matrix T. The matrix of the global linear 
system AU = B is hence a tridiagonal block matrix of size n2 X n2, each block 
being of size nı X nı: 


T D O ... 0 
DT D ` 
A=| o0 ` | Qo |. (8.7) 
-e i HPD 
0... 0 D T 


The right-hand side B of the linear system is a vector of size nı -n2, which can 
be built as a matrix to benefit from the numbering of the degrees of freedom 
associated with the grid. We start by initializing the right-hand-side vector 
using the right-hand-side function F(x1,x2) at the grid nodes: 
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Bis = 10 Gs for 1 < 1 < 1, 1 <j < Nə. (8.8) 


If the node (i,j) has a neighbor on the grid boundary (represented by a 
gray circle in Fig. 8.4), the contribution ae of this neighbor (2’, 7’) in the 
Laplacian discretization at point (i,j) must be added to the (i, j) right-hand 
side coefficient, the value of u; j being set by the boundary condition. The 
boundary condition contributions are therefore added to all terms By, ;, Bn, : 
for à = 1,...,n2 and B;1, Bin, for à = 1,...,n1. Beware of the special case 
occuring at the four corners! 











= Or Le 
; or q N41, 

Byy i = By, + BE) Puel 

Bia = Bin + OS) SSIS 

D ie a | or 1<j< na (8.9) 


Finite difference 2D Laplacian solution Error between exact and finite differences solutions 











Fig. 8.5. (a) Global solution and (b) error between exact solution and finite differ- 
ence approximation. 


Exercise 8.3. 1. Write a function DDM LaplaceDirichlet to compute the 
matrix A of size nin2 X Nın of the global linear system. The function 
computes and returns the matrix A. It receives in its input arguments 
the lower bounds of the domain, a; and ag; the number of points in each 
direction, nı and nə; and the discretization step h. The matrix A will be 
built by blocks using a tridiagonal matrix T and the identity matrix, both 
of size n X nı. 

2. Write a function DDM_RightHandSide2d to compute the right-hand-side 
vector of the linear system. The function returns the vector B. It receives 
in its input arguments the lower bounds of the domain, a; and ag; the 
number of points in each direction, nı and ns; and the discretization step 
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h. It calls the function rhs2d(x1,x2), which computes the value of the 
right-hand-side function F'(x1, x2). 

3. Write a function DDM_FinDif2d to compute the finite difference solution. 
The calling sequence should be 


function [n2,b2,Solm]=DDM FinDif2d(ni,ai,a2,b1i,b2, rhs2d,... 
f1,g¢1,f2,¢2,RightHandSide2d,Laplace) , 


where f1,g1,£2,g2 are the names of the functions defining the boundary 
conditions. 

4. We select a function u(z1, £2) = sin(x1 + x2) that is the exact solution 
of the problem —Au = f with the right-hand-side function F(x1,x%2) = 
2sin(x1 + x2) and with boundary conditions equal to the restrictions of 
the exact solution on the boundaries: 


fi(v1) = sin(x1 + a2), gi (v1) = sin(x, + bə), 
foto) = sin(aı + £2), g2(x2) = sin(b2 + z2). 


Program the functions DDM_rhs2dExact (x1,x2), DDM_f1Exact (x1), 
DDM_giExact (x1), DDM f2Exact (x2), DDM_g2Exact (x2) corresponding to 
this test case, along with the function DDM_u2dExact (x1,x2), which will 
be used to compute the exact solution at the grid discretization points. 

5. Write a program DDM_TestFinDif2d to test the previously defined func- 
tions: the size of the domain must be carefully chosen to ensure that the 
discretization step is the same in both directions. The solution computed 
with the parameters aj = a2 = 0, bi = 1, bə = 2, nı = 20, is represented 
in Fig. 8.5(a). Check the computation by displaying the error, that is, the 
difference between the exact solution u(xz1, £2) = sin(x1 + £2) and the 
finite difference solution, as in Fig. 8.5(b). 

6. Modify the previous program and adapt it to the thermal shock case (8.4) 
defined in Sect. 8.4, to obtain the solution displayed in Fig. 8.6. Use first 
a square domain of size bı — a, = by — ag = 6, then a rectangular one of 
dimensions bı — a; = 6 and bə — ag = 20. 
A solution of this exercise is proposed in Sect. 8.5 at page 182. 








8.4.2 Domain Decomposition in the Two-Dimensional Case 


We will now apply the technique described in Sect. 8.3 in the 2D case. The 
global domain is decomposed into ns subdomains with an overlap in the di- 
rection x2. Here again, in order to simplify the implementation, we assume 
that the domain can be discretized with the same step À in both directions. 
Furthermore, we also impose that all subdomains have the same size (n2+1)h 
and that all overlap regions have the same number of grid cells no. Figure 8.7 
shows an example of a decomposition satisfying these constraints. Note that 
they are very restrictive and might not be satisfied for an arbitrary domain 
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Fig. 8.7. Decomposition into four identical subdomains with constant overlap. 


la1,b1] x [a2, b2]. It might be necessary to adjust the total length bə — ag. 
We denote by u‘’* the solution on the sth subdomain [a,,b,] x [aÿ,b5], for 
s=1,...,n,, at iteration number k, for k = 0,1,.... The bounds of the sub- 
domain, a and b$, are equal to ad = az and a§ = aS! + (ng + 1 — no)h for 
k > 1 and b3 = a5 + (n2 + 1)h. 

With this notation, u** is solution of the problem 


— Aus (£1, £2 = F (21,22), for (£1, £2) = (a1, b1) x (a5, b3), 
w"*(a1,%2) = fa(xe), for ta € (a5, b3), 


2 
2(x2), for £2 € (a3, b3), 
s,k s\— fier); for s= iF 
Urea) A TOR? T O eS E 
s,k Su gı(zı), for S = ns, 
A ae a for s=1,...,ns— l1, E 


At the first iteration, the boundary condition on the right end of the subdo- 
main is not defined, except for the last one, where it is the global boundary 
condition. For all others it must therefore be arbitrarily fixed, for instance by 
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linear interpolation of the global boundary conditions f and gy: 


uth (1,05) = ((b2 — 65) f(a) + (b3 — a2) gr (21))/(b2 — a2). 


The iterations stop whenever the sum (or the maximum) of the error norm 
on each overlap region is below a given tolerance. 

We denote by X; the vector of the abscissa a,+ 7h, for j = 1,...,n:1, and by 
X5 the vector of the ordinates in the sth subdomain ag+jh+(s—1)(n2+1—n,), 
for 7 = 1,..., n2. The Schwarz algorithm in two dimensions can be written as 


Initialization: 
Vast = fil(X1), Ubar = g1(X1), 
U = ( (bə — b$ )Ub r + (b3 — a2)Uas1)/ (b2 — a2)), for s = 2,..., Ns, 
Us, = f(X) Uf =g9(X3) Be = F(X1, X5), 
By = B$, + U$, (he, B, = Bh, + UE he, 


for k= 1,2,... do 
for ES asna do 
if Ss = 1, U — aol else Le = at an (8.10) 


Hot Ua Ups ele Uae 
BtoB, BSB 4U lr Be a Be GFU Jr 
solve AUSF = B** 

; „k —1,k 
PAR HU aU se TOR line 
end 

E*= sup |R] 


s= seee Ms 


if Ef <s end 


In this algorithm, À is the matrix resulting from the discretization of the 
operator — A, defined by (8.7). It is here the same matrix for all subdomains. 


Exercise 8.4. 1. Write a function DDM Schwarz2d to implement the above 
algorithm. The calling syntax should þe 


function [conviter,cpu,mem,n2,b2]=DDM_Schwarz2d(n1,ns,no,... 
al,a2,b1,b2,rhs2d,f1,g1,f2,g2,RightHandSide2d,Laplace,n11), 


where the input parameters are 
ni, the number of cells in direction z1, 
n11, the number of degrees of freedom in direction z1, 
ns, the number of subdomains, 
no, the number of cells in the overlap regions in direction 72, 
Laplace, the name of the function to compute the Laplacian discretiza- 
tion matrix, 
e RightHandSide2d, the name of the function to compute the right-hand 
side, 
rhs2d, the name of the function F (£1, x2), 
f1, the name of the function f1(xı) defining the boundary condition 
on the edge x2 = ao, 
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e gi, the name of the function gl(x:) defining the boundary condition 
on the edge zə = bo, 

e f2, the name of the function f2(x2) defining the boundary condition 
on the edge x1 = ay, 

e g2, the name of the function g2(x2) defining the boundary condition 
on the edge x; = bı. 

The function returns as output arguments 

e conviter, the number of iterations necessary to have a maximum error 
in the overlap regions below the specified tolerance tol, 

e cpu, the computing time, 

e mem, the necessary memory. 

2. Write a program DDM_TestSchwarz2d to test the algorithm with the same 
function f as in the global case for the following parameter values: a, = 
a2 = 0, bi = 1, nı = 9, b2 = 30, no = 10, n, = 20. 

A solution of this exercise is proposed in Sect. 8.5 at page 184. 





Study of the method’s performance: we now study the influence of the sub- 
domain size on the convergence speed. We therefore need to estimate, for a 
given subdomain decomposition, the computing time necessary to achieve the 
specified accuracy. Computing time is measured within MATLAB using the 
commands tic and toc at the beginning and end of the script, or part of the 
script, that is to be monitored. Since it is the elapsed time that is actually 
measured, this is best done on a single-user computer. Furthermore, in order 
to be able to compare several configurations, only one parameter should vary. 
We keep constant the dimensions of the global domain, which imposes con- 
straints on the number of subdomains, their size, and the size of the overlap 
region. 








Exercise 8.5. Fix the parameters bı = 1, b = 50, nı = 9, and the overlap 
size Ng = 4. We assume that realistic values for the number s of subdomains 
will run from 5 to 60. For each value of n, within this range, check whether the 
decomposition is possible, and if it is, compute the solution using the function 
DDM_Schwarz2. Display the performance in computing time, memory, number 
of iterations, as a function of parameters no and ns. Analyze the influence of 
the overlap size on the algorithm’s convergence. 

A solution of this exercise is proposed in Sect. 8.5 at page 187. 


8.4.3 Implementation of Realistic Boundary Conditions 


More realistic heat conduction test cases require the implementation of ad- 
ditional boundary conditions besides the Dirichlet one that we have used so 
far. Let us consider, for instance, the temperature field within a bus bar like 
the one sketched in Fig. 8.8. The electric field produces heat at the uniform 
rate F (21,72) = q = 10° W/m?. The following temperatures are imposed on 
the electrodes: 40°C on the left end and 10°C on the right end, thanks to 
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Cooling 


Electric 
current 





Fig. 8.8. Bus bar sketch. 


a cooling liquid circulation device. The two lateral faces of the bar as well 
as the bottom face are insulated, meaning that a Neumann boundary condi- 
tion has to be imposed. On the upper face, we impose a Fourier, or Robin, 
boundary condition in order to model the natural convection-driven cooling 
phenomenon. The thermal transfer coefficient is equal to &n = 75 W/m? and 
the outside temperature is 0°C. The thermal diffusivity coefficient of the alloy 
is equal to k = 20 W/m K. Solving for à = u — Uext, problem (8.1) becomes 
in this particular case 


—kAu(a1,%2)=q, for ref, 
u = 40, on T2 = Q9, 


u = 10, on eo = bə, 


O 8.11 
= 0, on zı = bi, ) 
n 


u 
— +G&hu = 0, on =a}. 

On 
Here 2 denotes the derivative with respect to the normal vector to the sur- 
face. In order to discretize these Neumann and Fourier boundary conditions 
we introduce in the system the degrees of freedom corresponding to these 
nodes on the faces x; = a; and zı = b1. There are now nı + 2 nodes in the 
xı direction instead of the nı in the Dirichlet case. For the nodes where the 
Neumann condition applies, we write 





Un ,+1,j = Uni +2,5; 


which eliminates the outside node reference un,+2,; in the Laplacian dis- 
cretization at nodes of indices (nı + 1,7), giving eventually 





3Un, 41,7 — Uni, j — Unitl1,j—-1 — Un, 41,741 


q 
h2 = Ts = p? 
On the other hand, the Fourier condition is discretized as 


uo,j — U-1,j + hctnuo,j = 0, 
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which eliminates the reference to nodes (—1, 7) in the Laplacian discretization 
at nodes of indices (0,7), leading eventually to 


SL Ma Nd eS a 
= >] 


h2 


Exercise 8.6. 1. Modify Algorithm (8.10) to take into account the Fourier 
and Neumann boundary conditions. 
2. Implement a function DDM_LaplaceFourier that builds the linear system 
tridiagonal block matrix. 
3. Implement functions DDM_RightHandSide2dFourier, DDM_f1BB, DDM_g1BB, 
and DDM_rhs2dBB for the bus bar problem. 
4. Modify function DDM_FinDif2d and script DDM_TestFinDif2d in order to 
treat this problem with the global finite difference algorithm. 
5. Modify function DDM Schwarz2d and script DDM_TestSchwarz2d in order 
to treat this problem with the Schwarz algorithm. 
Solutions of this exercise are proposed in Sect. 8.5 at page 183 for the 
global solution and at page 188 for the domain decomposition solution. 


The global treatment provides the solution displayed in Fig. 8.9. The influence 
of the Fourier condition is clearly seen on the boundary at xı = 0, where the 
solution decreases toward the outside temperature value. 


Finite difference 2D Laplacian solution 








Fig. 8.9. ‘Temperature in the bus bar. 


8.4.4 Possible Extensions 


The first extension from the point of view of the domain decomposition is 
of course to adapt the implementation to a decomposition on subdomains of 
different sizes. The degrees of freedom bookkeeping is then more complicated, 
and each subdomain requires the computation of a specific matrix. Should 
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these matrices be computed once and for all and stored in memory, or com- 
puted again at each iteration? What is the influence of this strategic issue on 
the computing time? 

Another extension of the project can be the domain decomposition in both 
directions, which will enable the treatment of problems set on domains with 
complex geometry. The storage of the solution and the connection of the de- 
grees of freedom in the overlapping regions requires a rigorous implementation. 
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Solution of Exercises 8.1 and 8.2 

The function DDM_FunSchwarzid implements the Schwarz algorithm in the 
case of two subdomains of the same size. This constraint greatly simplifies 
the implementation, since the matrices of the local linear systems have the 
same dimensions for both subdomains. In the case that the coefficient c(x) is 
constant, the matrix is exactly the same in both subdomains. The enforcement 
of the constraint requires a careful translation of the mathematical indices 
into MATLAB (once again, keep in mind that the indices of an array in 
MATLAB start from 1). One method consists in setting the number of space 
steps in the global domain to an even number, and therefore the number of 
discretization points, including the edges at x = a and b, to an odd number 
Ng. The space step is denoted by h = (b—a)/(nz—1). Then the (even) number 
of space steps within the overlap is fixed to 2n,, with the parameter n, sent 
as input to the function. From these data the position x; of the left side of 
the right-hand-side subdomain is computed: 





xı = 0.5(a + b) — nah, 
along with the position x, of the right edge of the left-hand-side subdomain: 
£r = 0.5(a + b) + Noh. 
Eventually, the number of space steps in each subdomain is equal to 
its ge Ly a Ss 


Once these parameters are set, the finite difference matrix for the subdomains, 
of size tg — 1 x 7g — 1, can be computed. Two right-hand-side vectors are also 
defined. The influence of the boundary conditions at points x, and x, is not 
included at this stage since they vary at each iteration. 

The function DDM_FunSchwarzid uses the function DDM_rhsid to compute 
the right-hand side. 

The output parameters of the function are the number of iterations re- 
quired to reach the convergence tolerance, and the computing time, estimated 
using the tic and toc MATLAB commands. 
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To answer Exercise 8.1, the function is called once with nọ = 10 from 
the script DDM_CallSchwarzid. The parameter detail is set to 1 so that the 
solution is displayed at each iteration, and the evolution of the error as a 
function of the iterations is also displayed once convergence has been reached. 





CPU versus overlapping size 











Fig. 8.10. Computing time performance of the decomposition versus the size of the 
overlapping region. 


The script DDM_PerfSchwarz1d does the performance study required in 
Exercise 8.2. It calls the function DDM_FunSchwarzid for all values no between 
1 and n,/10, and stores the corresponding number of iterations and computing 
time in arrays. A graph of the computing time as a function of the overlap size 
is displayed in Fig. 8.10. The method converges faster as the overlap increases, 
and the computing time for each iteration does not vary that much; therefore 
the overall computing time decreases as the size of the overlap region increases. 





Solution of Exercise 8.3 

To implement the 2D problem, it is interesting to preserve the double num- 
bering of the discretization nodes associated with the Cartesian grid for the 
graphical representation of the solution, the boundary conditions, and the 
right-hand-side implementation. The global numbering of the degrees of free- 
dom in a column vector can be used only to solve the linear systems. MAT- 
LAB can easily convert an nı x ngo array containing the unknowns u; j into 
an array uz, With k = 1,...,n1 X ng, and conversely, as in the following script: 


% If size(tab)=[n1,n2] 
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col=tab(:); % size(col)=[n1*n2,1] and col((j-1)*n1+i)=tab(i, j) 
h inversely if size(col)=[ni*n2,1] 

tab=zeros(ni,n2) ; 

tab(:)=col; 


We first give MATLAB programming solutions for the functions: the finite dif- 
ference matrix for the 2D Laplacian operator in the case of Dirichlet boundary 
conditions is built by the function DDM_LaplaceDirichlet. 

In the test case proposed in question 5, the boundary conditions compat- 
ible with the exact solution 


Uu(T1,%2) = Sin(x1 + £2) 


are programmed in files DDM_f1Exact.m, DDM_g1Exact.m, DDM_f2Exact.m, 
DDM_g2Exact.m. The right-hand-side function 


fai, £2) = —Vu(21, £2) = 2sin(x1 + z2) 


is programmed in the function DDM_rhs2dExact.m. The right-hand side of 
the linear system in the case of Dirichlet boundary conditions is assembled 
by the function DDM_RightHandSide2dDirichlet. It uses the right-hand- 
side function as specified in (8.8) and the boundary conditions as speci- 
fied in (8.9). The computation of the finite difference solution is performed 
by the function DDM_FinDif2dDirichlet. It receives in its input arguments 
the functions DDM_f1Exact, DDM giExact, DDM f2Exact, DDM g2Exact, and 
DDM_rhs2dExact, whose local names are respectively f1, g1, £2, g2, rhs2d. It 
is able to treat other test cases and other boundary conditions. In the present 
case of inhomogeneous Dirichlet boundary conditions on all the boundaries, 
the number of degrees of freedom in the x, (respectively x2) direction is equal 
to nı (respectively nz), that is, the number of inside nodes in this direction. 

The test case proposed in question 5 of Exercise 8.3 is treated in the 
first part of the calling script DDM_TestFinDif2d. The finite difference solu- 
tion is compared with the exact solution by displaying their difference. The 
solution of the thermal shock described in Fig. 8.3, for which the exact solu- 
tion is not known, is treated by calling the function DDM_FinDif2dDirichlet 
with the functions DDM_rhs2dCT, DDM_f1CT, DDM_giCT, and DDM_f2CT as in- 
put arguments in order to compute the right-hand side. Eventually, the last 
computation performed in the script DDM_TestFinDif2d.m corresponds to 
the bus bar problem of Exercise 8.6. It is actually done by the function 
DDM_FinDif2dFourier. The Laplacian matrix is computed by DDM_Laplace- 
Fourier, where Neumann boundary conditions on the edge xı = a, and 
Fourier boundary conditions on the edge x7; = bı are handled. Different func- 
tions DDM_RightHandSideFourier and DDM_rhs2dBB are used to compute the 
right-hand side. The inhomogeneous Dirichlet conditions, defined on edges 
parallel to xı, are taken into account using functions DDM_f1BB and DDM_g1BB. 
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Solution of Exercise 8.4 

Algorithm (8.10) corresponding to the Schwarz method in the case of Dirich- 

let boundary conditions on all four edges is programmed in the function 

DDM_Schwarz2dDirichlet below and is tested in the script DDM_TestSchwarz2d 
for the two examples treated in the previous exercise: 


function [conviter,cpu,mem,n2,b2]=DDM_Schwarz2dDirichlet(n1, 
ns,no,al, a2,b1,b2,f,f1,g1,f2,g2,detailed) 

hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

hh function [conviter,cpu,mem,n2,b2]=DDM_Schwarz2dDirichlet(n1, 

hh ns,no,ai, a2,b1,b2,f1,f1,g1,f2,g2,detailed) 

1 Exercise 8.4 

hh Schwarz method of domain decomposition with overlap 

hh for the finite difference solution of the boundary conditions 

hh problem 

hh -nabla u=f on [al,bi]x[a2,b2] 

hh + Dirichlet b. c. 

Ah u(ai,x2)=f2(x2) u(x1,a2)=f1(x1) 

hh u(b1,x2)=g2(x2) u(x1,b2)=g1 (x1) 

hh 

A% Input parameters: 

YAA ni: number of cells on [a1,b1] 


hh ns : number of subdomains in the x2 direction 
hh no  : number of cells in overlapping region 

hh al, a2, b1, b2: minimal and maximal abscissa and 
hh ordinates of the rectangular domain 


hh f : right-hand-side function of the problem 

hh f1, g1,f2,g2 : functions defining the inhomogeneous 

hh Dirichlet boundary condition on the four edges of the domain 
hh detailed: nonzero to have intermediate graphical displays. 

hh 


4% Output parameters: 


hh conviter: number of iterations 

hh cpu : computing time 

hh mem : memory needed to store the solution and the matrix 
hh n2 : number of points per subdomain in the x2 direction 
hh b2 total size of domain in the x2 direction 


AIIIN INNAN INI IIIIIIIIIIIIIII ITE 
tic % computing time counter start 

h=(b1-a1)/(n1+1); 

n2tot=round(b2/h); h there are n2tot cells in total domain, 
n2=round((n2tot-no*(1-ns))/ns)-1; % and n2+1 cells in each subdomain 
n2tot= ns*(n2+1)+no*(1-ns) ; 

b2=a2+h*xn2tot ; /% final global size 

h memory needed to store the solution and the matrix 
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mem=(ni*n2) ~2+ns*n1i*n2; 
h 
h the size of each subdomain is ni x n2 
h 
aii=ai-h; % Dirichlet condition on the edge // to X2 
h 
/%/ Boundary conditions independent of iterations are set in arrays 
Ua2l=feval(f1,a1+h*[1:n1])? ; % boundary condition on edge x2=a2 
Ub2r=feval(gi,aith*[i:ni])’; % boundary condition on edge x2=b2 
À The right-hand side on each subdomain is an array to which the 
A contribution of internal edges will be added at each iteration 
RHS=zeros (n1i*n2,ns) ; 
starts=0; % starting index of subdomain s 
Rhsm=zeros(ni,n2) ; 
Solm=zeros(ni,n2); 
for s=l:ns 
RHS(: ,s)=DDM_RightHandSide2dDirichlet(f,h,n1,n2,a1,a2+starts*h) ; 
h Dirichlet boundary condition on edge xi=al 
Uai(s,:)=feval(f2,a2+starts*xh+[1:n2]x*xh); 
h Dirichlet boundary condition on edge x1=b1 
Ub1(s,:)=feval(g2,a2+starts*xh+{[1:n2]*h); 
Rhsm(1,:)=Uai(s,:)/h°"2; 
Rhsm(ni,:)=Ub1i(s,:)/h"2; 
RHS(:,s)= RHS(:,s)+Rhsm(:); 
Solm(: ,no)=(i*Ub2r+(ns-s) *Ua21) /ns; 
Solcol(:,s)=Solm(:); 
starts=startstn2+1-no; 
end 
Rhsm=zeros(ni,n2) ; 
Lapl=-DDM_LaplaceDirichlet (h,n1,n2) ; 
maxiter=100; conviter=1; err=1; epsilon=0.001; 
Solcol=zeros(n1*n2,ns) ; 
ERR=[] ; 
while err>epsilon & conviter<maxiter 
starts=0; 
err=0; 
Ua2=Ua21; left edge in x2 contains n1+2 row 
for s=1:ns 
if s<ns 
A The boundary condition on the right edge is obtained 
A from the solution at previous iteration on the right-hand-side 
h neighboring subdomain 
Solmr=zeros(n1i,n2) ;Solmr(:)=Solcol(: ,st1); 
Ub2=Solmr(: ,no) ; 
else 
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Ub2=Ub2r; % the exact boundary condition is used on the right edge. 
end 
h 
Rhsm(:,1)= Ua2/h^2; 
Rhsm(:,n2)= Ub2/h^2; 
Rhscol=RHS(:,s)+Rhsm(:) ; 
Solcol(:,s)=Lapl Rhscol; 
Solm=zeros(n1,n2) ;Solm(:)=Solcol(:,s); 
% The boundary condition on the left edge is obtained for the 
% next subdomain on the right 
Ua2=Solm(: ,n2-not1) ; 
if s>1 % the overlapping region is extracted 
OVER=Solml1 (: ,n2-no+2:n2)-Solm(: ,1:no-1) ; 
err=max(err,norm(OVER(:) ,inf)); 
end 
Solml=Solm; 
if detailed, 
surf (a2+h*startsth*[1:n2] ,aii+h*x{[1:n1],Solm); 
title(strcat(’iteration ’,int2str(conviter) )) 
if s== 
hold on 
end 
end 
Starts=startstn2+1-no; 
end 
ERR=[ERR, err] ; 
conviter=convitertl ; 
end 
cpu=toc; % computing time counter starts here 
if detailed, % Visualization after convergence 
Ua21=[feval(f1,al); Ua2l1;feval(f1,b1)] ; 
Ub2r=[feval(gi,ai); Ub2r;feval(gi,b1)] ; 
figure; hold on ; 
Solmr=zeros(ni,n2) ;Solmr(:)=Solcol(:,1);: 
A In the case of Dirichlet bc on edges // to x2 two rows 
h corresponding to the boundary conditions on xi=ai and x1=b1 
4 are added to the solution on each subdomain 
starts=0; 
Solmr= [Uai(1,:);Solmr;Ub1(1,:)]; % Dirichlet bc 
Solmr=[Ua21,Solmr]; % exact boundary condition on left edge 
surf (a2+h*startsth* [0:n2] ,ai+hx[0:n1+1] ,Solmr) ; 
starts =startstn2+1-no; 
for s=2:ns-1 
Solmr=zeros(ni,n2);Solmr(:)=Solcol(:,s); 
Solmr= [Ual(s,:);Solmr;Ub1(s,:)]; 
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surf (a2+h*startsth*[1:n2] ,ai+hx[0:n1+1] ,Solmr) ; 
starts=starts+n2+1-no; 


end 
Solmr=zeros(ni,n2) ;Solmr(:)=Solcol(:,ns): 


Solmr= [Uai(ns,:);Solmr;Ubi(ns,:)]; 
Solmr=[Solmr,Ub2r]; exact boundary condition on right edge 
surf (a2+h*startsth* [1:n2+1],a1th*[0:ni+1] ,Solmr) ; 

title(’Final solution’ ) 

end 

An interesting programming feature is the array RHS indexed by the subdo- 
main, which contains the right-hand side of the linear system for the cor- 
responding subdomain. This array is initialized with the contribution of the 
right-hand-side function f (2 1,22) as well as the Dirichlet boundary conditions 
on the global domain edges. The array is used in the subsequent iterations, in 
the loop on subdomains, to initialize the right-hand-side vector Rhsm of the 
local linear system. The contribution of the Dirichlet boundary condition on 
internal edges, which depends on the solution on neighboring subdomains, is 
then added before the linear system is solved. 

The matrix Lap1 of the linear system is the same for all subdomains and does 
not depend on the solution; it is therefore computed once and for all outside 





the loop on iterations. 
Figure 8.11 displays the solutions after 1 and 4 iterations for the first test 


case, where the exact solution is known. 
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Fig. 8.11. Solutions computed on 20 subdomains, after (a) 1 and (b) 4 iterations. 


Solution of Exercise 8.5 
To analyze the convergence, we propose the script DDM_Perf.m, which also 


calls the function DDM Schwarz2dDirichlet. The first test case is considered, 
with this time a larger domain in the x2 direction: bg — ag = 50. The width 
of the domain remains equal to 1 and is discretized with 11 cells, leading to 
nı = 10 degrees of freedom in the xı direction. The size of the overlap region is 
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fixed to 10 cells in the zə direction. All “reasonable” values for the number of 
subdomains are tried in a loop, from n, = 5 subdomains (which corresponds 
to 117 degrees of freedom per subdomain in the x2 direction) to ns = 60 sub- 
domains (which corresponds to 18 degrees of freedom per subdomain). Since 
the discretization step must be the same in both directions x; and £2, some 
configurations are impossible: the step h is fixed by the number of points in 
the x; direction, nı = 10, that is, h = (bı —a1)/(nı +1). Therefore the number 
of internal points in the x2 direction is also fixed to n§°t™ = (be — a2) /h, with 
the constraint that ng must be an integer. Furthermore, the number of points 
in the zə direction in one subdomain, n2, taking into account the overlap re- 
gion, must satisfy n,(no +1) = nte*! —n,(1—n,), while being also an integer. 
Only the decomposition configurations leading to a total length of 50 with a 
0.2% tolerance are considered. 

For these allowable configurations, the function DDM_Schwarz2dDirichlet 
is called, with this time the input argument detailed set equal to 0 to in- 
hibit some of the intermediate graphical outputs. On the other hand, output 
parameters return the number of iterations, the computing time, the memory 
size, and the number of points per subdomain ng. Figure 8.12 shows the results 
obtained for two different sizes of the overlap regions: ng = 4 and no = 10 
cells in the xg direction. A comparison of the computing time curve with the 
number of iterations curve is particularly interesting. One would expect that 
the number of iterations should increase with the number of subdomains, but 
since the computing time necessary for each subdomain decreases along with 
their size, the evolution of the global computing time is less predictable. The 
simulations indicate that the optimal number of subdomains might depend 
on the size of the overlap. 

For comparison, the direct computation of the global solution on 9 x 499 
degrees of freedom would require 35 seconds of computing time, which is more 
than the decomposition method requirement in the worst configuration case. 
The memory necessary to store the matrix of a global linear system on the 
order of 2.107 is also prohibitive. 











Solution of Exercise 8.6 


We now denote by X, the vector containing the nı + 2 abscissa a; + jh, 
j =0,...,n1 +1, including a; and b1. The vectors X$ of Algorithm (8.10) are 
unchanged. ‘The Laplacian matrix is modified in order to take into account 
the Neumann and Fourier boundary conditions, using the block representa- 
tion (8.7). The T and D matrices are now of dimension (nı + 2) x (nı +2), 
and the matrix T is different from the previous one (8.6) in the first and last 
rows 
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Fig. 8.12. Performances of the decomposition versus the number of subdomains for 
two different sizes of the overlap. 
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Since the outside temperature is equal to 0°C, there is no contribution of the 
Fourier boundary condition on the right-hand side. The Schwarz algorithm is 
now as follows: 
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initialization : 
Vast = fa(X1), Uber = 92(X1) 
= (Up ine =S) is for 8202440: 
Be =F(X1,X3), for s=1,...,ns 


for Sl Dex “do 
for. “S15 raig “do 
tesk, U =L cise Ue = i oi 
fus. UF, Sj Une: e UF = Oe (8.12) 


Bok =B, BYE = BML US he Bek = Bek + Up Je 
solve AU** = Bok | | | 


: Ss s,k s—1,k e 
ifs>1, RoS U SU a ia oir hao l 
endof s 
E*= sup |R°| 
Salle 


if EF <e endof k 


This algorithm is programmed in the function DDM_Schwarz2dFourier and 
tested in the third example of the script DDM_TestSchwarz2d.m. 
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Geometrical Design: Bézier Curves and 
Surfaces 


Project Summary 


Level of difficulty: 2 
Keywords: Bézier curves, Bézier surfaces 


Application fields: Computer-aided geometric design, geometric model- 
ing, computational graphics 


9.1 Introduction 





Many fields in the computational science area need descriptions of complex ob- 
jects: virtual reality, computational graphics, geometric modeling, computer- 
aided geometric design (CAGD). These descriptions are commonly obtained 
using basic elements: points, curves, surfaces, and volumes. Elementary tools 
used to handle these elements are mathematical functions such as polynomi- 
als and rational functions, which allow easy graphical representation in many 
situations: union of objects, intersection, complement. 

The very first studies in geometrical design go back to the sixties and were 
related to industrial projects. For example, J. Ferguson (Boeing) and S. Coons 
(Ford) in the United States, P. de Casteljau (Citroén) and P. Bézier (Renault) 
in France, were pioneers in the discipline. This chapter gives an introduction to 
geometrical design by studying some properties of the so-called Bézier curves 
and surfaces. 








9.2 Bézier Curves 


Let n > 2 be an integer and t € [0,1] a parameter; consider m + 1 points in 
R”: Po, Pi, ...,Pm (distinct or not) and define the point P(t) by 
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m 
POS) Cri "Pe (9.1) 

k=0 
where CE = Ge Ey! is the binomial coefficient. The Bézier curve Bm with 
control points Po, P1, ..., Pm is the trajectory described by P(t) as t goes 





from 0 to 1. The polynomials BE (t) = Ck t*(1 —t)™-* are the Bernstein 
polynomials of degree m, with the following properties: 


VOIES BEC. Dee 
k=0 


(9.2) 
B (0) = 0, for 0 < k < m, B? (0) = 1, 


BY) =0, for 0a ke mn, B =i. 


It follows from (9.1) and (9.2) that P(0) = Po and P(1) = Pm. Generally, 
Po and Pm are the only control points on the curve Bm. Definition (9.1) allows 
one to represent, exactly and in a condensed form, a great diversity of curves 
in R”. Fig. 9.1(a) displays an example of a Bézier curve defined in R? with five 
control points. Note that the order in which the control points are considered 
in (9.1) will dictate the shape of the curve: for example, in Figs. 9.1(a) and 
(b) the same control points are used, but Po and P, have been interchanged. 
More generally, attempting to change any control point will result in the entire 
curve being modified. 

Since Bernstein polynomials are linearly independent functions, two Bézier 
curves coincide for the same value of m when they share the same control 
points. Nevertheless, it is important to note that the same Bézier curve admits 
different representations of type (9.1), corresponding to different values of m. 
For example, consider two points Py and P, and define Q; to be the midpoint 
of PoP); the line segment PoP; is defined by either 











RO= Creat Pe ae) 
k=0 


Q(t) = S> FE- (m=2) 


k=0 
with t € [0,1], Qo = Po, and Q2 = P,. If we introduce the new control 
points Ro = Py, Ry = (2Po Fe P:)/3, Ro = (Po a 2P;)/3, and R3 = Pı, the 
Bézier curve corresponding to the definition 


R()= SN Ft-t) FR (m= 3) 
k=0 


with t € [0,1] is still the line segment PoP, ! (check that P(t) = Q(t) = R(t)). 
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Fig. 9.1. Two Bézier curves. 


Remark 9.1. The control point Pk, for any 1 < k < m, does not generally 
belong to the Bézier curve; it is, however, possible to introduce another type of 
Bézier curve, called a Bézier interpolation curve; which contains all its control 
points. Both kinds of curves belong to the family of spline curves. Spline 
curves are generated using the general definition (9.3), in which functions fk 
are polynomials of degree m: 





P(t) = >> A(O Pe (9.3) 
k=0 


Many other curves (B-splines, NURBS) are defined by way of such a for- 
mula (Coons (1974), Hoschek and Lasser (1997), or Piegl and Tiller (1995)). 
In (9.3) the blending functions fẹ may be polynomials (of degree p 4 m), 
rational functions, etc. All the corresponding curves are entirely defined by 
setting the control points and choosing the associated functions. Note that a 
curve may be also defined “piecewise”, as the union of distinct curves sharing 
the same endpoints. In this chapter we shall limit our study to Bézier defined 
by formula (9.1). 





9.3 Basic Properties of Bézier Curves 


In this section we study some properties of Bézier curves, which are relevant 
for practical applications. 


9.3.1 Convex Hull of the Control Points 


According to (9.1), the point P(t) is defined as the barycenter of the m + 1 
control points Pp, with corresponding weights BE (t). If follows from the first 
relationship in (9.2) that P(t) belongs to the convex hull of the control points. 
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We may see in Fig. 9.2(a) that a Bézier curve lies entirely within the convex 
hull of the control points. Note that this convex hull contains the polygon 
PoP,...PmPo, which is commonly referred as the control polygon. From a 
more general point of view, it is worth noting that in many situations the 
control polygon is not convex (as can be seen from Fig. 9.2(b)). 

















Bézier curve with control polygon Bézier curve with control polygon 
3.57 3.55 
P2 

3) 3F 
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2; 2; 
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Fig. 9.2. Control polygon. (a) convex; (b) nonconvex. 


9.3.2 Multiple Control Points 


When defining a Bézier curve, it is not necessary to use distinct control points. 
This allows us to create more or less complicated shapes, closed or open curves, 
as displayed in Figs. 9.3 and 9.4. 


Bézier curve with multiple control points Bézier curve with multiple control points 

















Fig. 9.3. Multiple control points (1). 
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Bézier curve with multiple control points Bézier curve with multiple control points 
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Fig. 9.4. Multiple control points (2). 


9.3.3 Tangent Vector to a Bézier Curve 


Let P(t) be the point of the Bézier curve B,, corresponding to the value t of 
the parameter. The tangent vector to B,, at P(t) is defined by 


r(t) = “ Pit) =y © BE (£) Pe (9.4) 
k=0 


It follows from definition (9.3) that 
le e eN 
mme ME TEA, 

— Bk (t) = Chm St eon 

mie *(m —1-— mt), ifk=m-—1, 
mt” ifk=m. 

Consequently, when Pp Æ Pı, the tangent vector at Po is T(0) = mPoP\. 

The Bézier curve Bm is tangent at Po to the edge P Pı of the control polygon. 


Similarly, if Pn-1 # Pm, then T(1) = mMP,,_1Pm and the curve is tangent at 
Pm to the edge P,,-1Pm. This property is illustrated in the previous figures. 


Remark 9.2. More generally, it can be proved that 


d 


7 Bmi(t) Ab; be 0) 


This formula may be useful to compute the tangent vector T(t). 


9.3.4 Junction of Bézier Curves 


We address now the problem of linking two Bézier curves. Consider the Bézier 
curve defined with m + 1 control points Po, P1,...,P,, and another curve de- 
fined with m’+1 control points P$, Pi,..-, Ph. We are interested in studying 
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how these two curves connect, and more particularly how this junction looks 
on a display. This is an important problem in CAGD, where maximum quality 
in the rendering of pictures is expected. 

In order to get a C° connection (that is, a continuous junction of the two 
curves) we have to lay down a basic condition: Pm = Pj. Then, since the 
first curve Bm is tangent to the line segment P,,-1Pm at Pm and the second 
curve B’, is tangent to PoP; at Po, a tangential connection of the curves 
is obtained if and only if the three points P,,_1, Pm (or Pi), and P; lie on 
a straight line. This condition, which is called the G! continuity condition, 
is generally sufficient to get a satisfactory layout. Nevertheless, for a better 
rendering, it is natural to ask for more, namely a Ct continuity condition. This 
will be satisfied if the tangent vector T passes continuously from the first curve 
to the second one. We know that the tangent vector to Bm at Pm is mPm-1Pn, 








while for B',, the tangent vector at Pi is mP}P,. The Ct continuity condition 


is then satisfied when Pm-1Pm = PP. This is equivalent to saying that 
Pm = P$ is the midpoint of the line segment Pm-1 P}. 

Figure 9.5 shows an example of a G! junction (left), together with an 
example of a Ct junction (right). The actual difference is not visible here; 
but it is clear from Fig. 9.5(b) that P4 (point P4 = P$) is the midpoint of the 
line segment P3P;, while this is not true in Fig. 9.5(a). Distinguishing between 
these two kinds of junction is important when one has to handle evenly spaced 
points on the curve B = Bm U Bi. 





G1 continuity connection C1 continuity connection 
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/ X / 

















(a) (b) 


Fig. 9.5. Junction of curves. (a) G} continuity; (b) C? continuity. 


9.3.5 Generation of the Point P(t) 


Although the point P(t) is exactly defined from (9.1), the effective construc- 
tion of P(t) for a given value of the parameter t € [0,1] in this way is time- 
consuming. Furthermore, since the calculation of high degree polynomials val- 
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ues is not an accurate process, the point resulting from (9.1) will be generally 
different from the actual P(t). Fortunately, there is another way, cheap and 
accurate, to obtain P(t) using the recurrence between Bernstein polynomials: 


Bmilt) = tBy, (€) + (1 - t)Bn (2). (9.5) 





This result is proved by writing 

Ce) Ur ee © de ee eC at Le 
= (CR-1 4 OF tE (1 — EH 
= Cat — PE = Bris). 

This property is useful for displaying Bézier curves: for a chosen value 
of t € [0,1], the points P?, for p = 0,1,...,m and q = p,p + 1,..., mM, are 
successively defined by 

initialization: for p = 0, do 
for q = 0,1,...,m, do 
Pean 
end do 


end do 


9.6 
construction: for p = 1,2,..., m, do (26) 


for q = p,p+1,...,m, do 
Pair ASA 
end do 
end do 


We shall prove now that P™ = P(t). We first note that in (9. 6), at sue 
p, any of the m — p points P? is defined as the barycenter of P , and PP- 
which are the two points sprained: in the previous step. Wane theme cl 
induction on p, we prove that P? satisfies, for p = 0,1,...,m and q = p,p + 


1,...,m, the relationship 
mM 
= X B? P 


j=0 
This is trivial when p = 0 because of the definition of the points P 


(q = 0,1,...,m). We then assume the property to be satisfied to rank p — 1 
(included) and prove it for rank p. For q = p,p + 1,...,m, we write 


af E | | | 
PP= PP +(1-t)PP1 = 5 (Bi; +(1-t)B)P; = B Ph 
j=0 j=0 
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The relation is also satisfied for rank p, and then for any value of p < m. 
When p = m, this relation leads to 


ES). BP SP). 
j=0 


It follows that any point P(t) of the Bézier curve Bm with control points 
Po, P1,..., Pm can be built by means of algorithm (9.6), which is called de 
Casteljau’s algorithm. The computational cost to generate P(t) in this way 
is equivalent to performing m +---+ 1 = m(m + 1)/2 linear combinations; 
this is much cheaper (and more accurate) than the use of formula (9.1). The 
construction process is displayed in Fig. 9.6. 


De Casteljau algorithm 
3.57 


P2 
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Fig. 9.6. De Casteljau’s algorithm. 


9.4 Generation of Bézier Curves 


It is now time to deal with a few examples. We shall see how easy it is to 
construct Bézier curves using de Casteljau’s algorithm. 


Exercise 9.1. 1. Choose m + 1 points Po, Pi, ..., Pm in R°. 

2. Write a general procedure generating a point P(t) of the Bézier curve 
with control points P, Pi, ..., Pm, using de Casteljau’s algorithm (9.6), 
for any value t € [0,1]. 

3. Compute and display the corresponding Bézier curve. 

4. Repeat the experiment for different values of m and different sets of control 
points. 

5. Check the different continuity conditions. 





A solution of this exercise is proposed in Sect. 9.10 at page 210. 
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Remark 9.3. One may want to construct the curve displayed in Fig. 9.1(a). In 
this particular case, m = 4 and the control points are Py = (0,0), Py = (1, 2.5), 
P= (2,3). F0 Lo): Pe = (35,0). 


9.5 Splitting Bézier Curves 


Let Bm be the Bézier curve defined by m+1 control points Po, P1,...,P,,. Let 
0 be a given value in [0,1]. The point P(@) of Bm is associated with 0 by the 
algorithm (9.6). We successively construct the points P?, for p = 0,1,...,m 
and q = p,p + 1,..., m, and finally set P(@) = P”’. Consider now the Bézier 
curve Bm defined by the m + 1 control points PP Ene point P(t) 
on this curve is defined by 





=> cs ig lime LE (9.7) 


where the parameter ¢ belongs to the interval [0, 1]. Note that P(0) = P) and 
P(0) = P™. We are going to prove now that the Bézier curve Bm, with ending 
points P and P(@), is the part of the curve Bm obtained when the parameter 
t covers the interval [0,6]. We first check that the points P? generated by the 
algorithm (9.6) satisfy, for 1 < p < m and p < q < m, 


P 
P= dC OP EP y. (9.8) 


This result is obtained by mathematical induction on p. Using definition 
(9.6) we obtain, when p= 1 and 1 < q < m, 


Pr =0P° ,+(1—0 DATE Le 


We assume that the property is true up to the value p — 1. Then we write, 
for p <q < M, 


PP =0 PPZ] + (1 — 60) PP- 


-Sotoa PTE, Sa -106° (1 = 0) NP yk 


p—2 
DD De hrs. 
k=0 


+ (1—6)?P, Eo c = "Pe, 
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We obtain 
p—1 
Pa = >, Get — 0)?" Py_k + Pop 
k=1 
+(1—0 + Sci Or 


= OPP,» + X (CPI ETY + CE) (1 — 0) EP x + (1 — 0)P Py. 
k=1 


Finally, 
DCE OP ÉP,_p. 


The point P(t) of B,,, defined by the m + 1 control points oe LAPS ok 
satisfies 


= À cr - Lu P 


(9.9) 


m k 
=y Chr ls” Cen Pen 
k=0 l=0 


Let p an integer (0 < p < m). By gathering in this sum all the terms 
related to Pp, we obtain 


m= p 


POS ye ee s PP, 
p=0 l=0 


Then we recall that CP*'C?, , = CEC, and note that 


m= m— p)? 
e= Res = O Ea Gl ene) eae a 
The formula (9.9) is then written as 
m m— p 
POSS C ON S Ch_,(@ A — 0)" ? 'P, 
= l=0 
(9.10) 
=S C (6t)?(1 — 0 P,. 


For any given value of @ in [0,1], the product 8t lies in [0,8] when the 
parameter t covers [0,1]. When ¢ € [0,1] the point P(t) defined by (9.9) or 
(9.10) covers the Bézier curve with ending points Pp and P(@). 
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How do we get the complementary part of the curve? By a reverse ordering 


of the control points and by changing 0 into 1 — 0. Actually, the point P(0) 
of the Bézier curve with control points Po, P1..., Pm is defined according to 


= SCA — 6) jae = > ce g)kam—k Pp 


Then P(@) is the same point as the point Q(1 — 0) of the Bézier curve with 





control points Pm, Pn-1,...,/P0. In order to obtain the part of the curve 
with ending points P(0) and Pm, we first generate the points Q}, Qf,...,Q™ 
associated with the Bézier curve with control points Pm, Pm—1,..., Po. Then 
we set 


Remark 9.4. This result may be generalized to B-spline curves for which more 
properties can be established in relation to basic operations such as moving, 
removing and inserting a control point. 


9.6 Intersection of Bézier Curves 


We address now the problem of finding the intersection of two Bézier curves 
Bm and Bi, defined in R? by their control points. We describe the two curves 
by their generic points 


-5 C t* (1 er RP... 

> (9.11) 
PO] Cty ian OR 

k’=0 


Now, is it possible to find two values t and t’ such that P(t) = P’(t’)? How 
do we compute them when they exist? According to the theory of algebraic 
geometry, one can deduce implicit representations f(x,y) and f’(x, y) of both 
curves B and B’. But within the corresponding formulation, searching for 
a possible common point is equivalent to finding the roots of a polynomial of 
degree m + m’. This is a too complicated and time-consuming way to get the 
solution. We propose here a method based on particular properties of Bézier 
curves. We proceed as follows: Since any Bézier curve Bm is entirely contained 
in the convex hull Em of its control points, we know that the intersection set 
Bm O B'm is empty when the convex hulls €, and €’, do not intersect. 
Conversely, both curves Bm and By, may intersect when the convex hulls 
intersect; in order to obtain a more accurate view of the problem in this case, 
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we can split both curves into two parts and check the intersection of the 
corresponding convex hulls. Splitting a Bézier curve B,, into two subcurves 
Bi and B?, with their associated control points sends us back to the previous 
section. We denote by B1, the curve corresponding to the part of the curve 
Bm obtained for t € [0,0.5], while B?, corresponds to the part of the curve 
Bm obtained for t € [0.5, 1]. The control points of both curves B}, and B?, are 
defined by (9.8). We proceed then by successive iterations as long as there exist 
two intersecting convex hulls. The corresponding algorithm is the following: 


initialization : 
CaO ine Sco 
iterations: while Ep N €’, 40 , do 
split: Bk = Bk, U Bko 
associate: Ek, Ex, (9.12) 
split: By = By UB’ ys 
associate: €’, Ep! 
check: Ek, N Ew, (+) 
end do 


The intersection of two bounded convex sets is a bounded convex set, so 
Excise’ k is convex and contained in both Ep, and a ks; thus its “size” is 
decreasing as the algorithm (9.12) evolves. This algorithm converges in the 
following way: Either all intersections are empty and then curves Bm and 
B’ ~ do not intersect, or there exists at least one intersection whose size is 
vanishing as (9.12) proceeds; then the convex intersection is shrinking to a 
point common to both curves. Note that algorithm (9.12) is able to cope with 
multiple intersections. 

In order to check automatically whether sets €, and €”, intersect, we need 
to compute the convex hull of m+1 given points in R?. Since €; is bounded by 
a convex polygon Pk, its computer representation is a mesh of Pk. This mesh 
Mg may be any set of triangles whose union is equal to Pg. Their vertices 
are the control points and they satisfy the classical rule that the intersection 
of two distinct triangles is either empty or reduced to a common vertex or a 
common edge. We can then check whether €; and €”, intersect by testing the 
intersection of all pairs of triangles (7% ;, Tig) E Mp x M. 

Although this method is correct, it is too tedious for our purpose. For the 
sake of simplicity, we proceed as follows: Each convex hull €% is embedded in 
a rectangle Rx = [x5 , x5] x [yE y$;] defined by the extreme values of the 
coordinates of the control points. Then we replace in (9.12) the line quoted 
by (+) by check Rk, N Ryw. Since Ex C Rp, this may slow down the conver- 











gence of (9.12), but the modified algorithm is very simple to implement (the 
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intersection of two rectangles, when it exists, is a rectangle that is easy to 
compute). 

Stopping criterion: it is necessary to stop the iterations in algorithm (9.12). 
An efficient way to check the accuracy of the result is to set an acceptable 
smallest size ø of the rectangle R = Rp M Ry. When the length (or the 
width) of R is smaller than ©, we define the common point S = Bm N Bi, as 
the intersection of both diagonals of R. Finally, once S has been spotted, it 
remains to set it on the Bézier curve Bm; in other words, we have to compute 
the corresponding value of the parameter t such that 


= À cru - ns KP, 





This value is obtained by a linear approximation: 


7 x(S) — x(M:) (M2) — x(S) 
Ohya)! sc) =) O 
We proceed in the same way to compute the value of t’: 
gesi > CoA ah aes 
(9.14) 
G(S)— (My) png (Ma) — 2(S) 
FO = ay À COQUE) =a 


Remark 9.5. We use ordinates y( Mı) and y(Mə) in (9.13) when z(Mı) = 
(M). 


Remark 9.6. The stopping criterion may be modified by computing the dis- 
tance between the curve and the straight line Mı Mə, instead of the size of 
the rectangle R. This is obtained via an approximation of the curvature. For 
example, if (£k, Yk) are the coordinates of sampling points of Bm, we may use 
a value of h, defined by 





h= max(|2x 1 — 228 + Teil, \Ye—-1 — 2Yk + Ye41)). 


9.6.1 Implementation 


Exercise 9.2. 1. Compute and display two Bézier curves Bm and B;,, de- 
fined in R°. 
2. Implement the algorithm (9.12) in its simplified formulation. Check the 
intersection of the curves. 
3. Compute the values of t and t’ according to (9.13) and (9.14). Display the 
corresponding points S(t) and S’(t’). Compare to the values obtained by 
(9.12). 


A solution of this exercise is proposed in Sect. 9.10 at page 211. 
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Intersection of two curves : global view 





—2L 








-0.5 0 0.5 1 1.5 2 2.5 


Fig. 9.7. Intersection of Bézier curves. 


9.7 Bézier Surfaces 





Similar to Bézier curves, a pleasant and easy-to-handle representation of sur- 
faces is obtained using two parameters tı and ta. Let mı and m2 be two 
positive integers, and consider (m +1) x (m2 +1) control points Py, kọ € R°. 
For all (t1,t2) € [0,1] x [0,1], we define the point P(t,,t2) by the relation 





P(ty,t2) = > 5 CCE A La) Ps OAD) 
k1=0 k2=0 
As (t1,t2) ranges in [0,1] x [0,1], the corresponding point P(t,,t2) covers 
a surface, referred to as a Bézier patch by CAGD specialists (see Fig. 9.8). A 
Bézier surface is then the union of the Bézier patches (see Fig. 9.9). 





Remark 9.7. There is no assumption made on the layout of the points Pr, kz 
in using definition (9.15). Nevertheless, this layout has a significant influence 
on the final rendering of the generated surface. In order to modify the surface 
by moving some control points, it is easier to use points Pk; x, situated on a 
rectangular grid. The resulting surface is called a rectangular patch, opposed 
to a triangular patch, whose control points are situated on a triangular grid 
and the generating point is defined by 








m! 
P(ti, t2, t3) = k lk lka | 
ki+kotka=m 12 


—— ti ta ts Pha kaks- (9.16) 


9.8 Basic properties of Bézier Surfaces 


9.8.1 Convex Hull 


The point P(t) is the barycenter of the (mı + 1) x (m2 + 1) control points 
Pr, ,k. With the corresponding weights B* (t1) BE2 (t2). It follows from (9.2) 
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Example of Bézier patch CO continuity connection 





A = N WO FL Oo Q 








Fig. 9.8. Bézier patch. Fig. 9.9. C° continuity. 


that P(t) lies in the convex hull of the points P,,,,, which is a volume with 
polygonal faces. 


9.8.2 Tangent Vector 


The tangent plane to the Bézier surface at the point P(t,,t2) is defined by 
the two tangent vectors 71(t1,t2) = 4 Plt, t2) and T2(t1, t2) = fe Pa) 
Consequently, when Poo Æ P10 and Po o Æ Po, the Bézier surface is tangent 
to the triangle Pı o Po,oPo,1 at point Po o. The same property holds for vertices 
Pm,0, Poms and Pa ,me: 


9.8.3 Junction of Bézier Patches 


Let Syni.m, be the Bézier patch defined by the (mı + 1) x (mg + 1) control 
points Py, n. (1 < ki < mı, 1 < ko < mo), and oa the Bézier patch 
defined by the (m + 1) x (m5 + 1) control points Por ks (1 < k, < mj, 
1<k, < ms). We address now the problem of connecting these two patches. 

The simplest junction corresponds to so-called C° continuity, supposing 
that there exists a common curve located on the rim of the surfaces. Such a 
curve corresponds to an extreme value of tı or tg (namely 0 or 1). According 
to definition (9.15) and properties (9.2), such a curve is a Bézier curve. The 
C? continuity is then satisfied when the corresponding Bézier curves fit; this 
is true, for example, when the control points are identical on both curves. 

Now we look further for a better rendering of the junction, and suppose 
that Pmi,kə = Fo for k2 = 0, 1,..., Mo. À G' connection is obtained when 
vectors T1(1,t2) = 4, P(A, t2) and 71(0,t5) = ar P’(0, t3) are collinear at any 
common point. This condition is equivalent to saying that Pn, x; = P) ka is 
the midpoint of Pm,—1,h.P;,, for k2 = 0,1,...,m9. Furthermore, when the 
tangent vectors T2(1,t2) = 5 P(1, te) and 7;(0,t) = ae, P'(0,t2) are iden- 
tical at any common point, a C! connection is realized. Examples illustrating 
the different connections are displayed in Figs. 9.9 and 9.10. 
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Sil connu ponngeHan C1 continuity connection 
Q34 














Fig. 9.10. Junction of patches. (a) Gt continuity; (b) C? continuity. 


9.8.4 Construction of the Point P(t) 


According to definition (9.15), any point P(t1,t2) may be computed by an 
algorithm similar to (9.6). We write 


mı 
Poa e mp (9.17) 
kı=0 
with 
= Ct =. b) Pe (9.18) 


k2=0 


Let kı be a given integer. According to (9.17-9.18) the point Pk, appears 
to be the point Pk, (t2) of the Bézier curve defined by the m2 + 1 control 
points Pki 0, Pk,1,--+,Pky,m.- It is then possible to calculate its coordinates 
by means of de Casteljau’s algorithm (9.6). Once the m1 + 1 points Pk, are 
computed, a new application of (9.6) generates the point P(t1,t2) on the 
Bézier patch. The whole patch is then defined as the union of all rectangular 
faces generated by vertices P(t, t2), P(ti+ Ati, te), P(t1 + Atı, t2 + At2), and 
P(t, t2 + At2) (here Atı and Atə are the sampling step sizes for tı and tə). 
The corresponding construction algorithm (9.19) is called the de Boor-Coox 
algorithm: 
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for pı = 0, initialization: 
for qı = 0,1,...,m1, do 
construction of point P,, by (9.6) 
for pə = 0, initialization: 
for q2 = 0,1,..., m2, do 


0 = 
ig T Pa ,9 


q1,42 


end do 
end do 
for po = 1,2,...,me2, do 


for q2 = p2, pə + 1,..., M2, do 
(9.19) 


me pa) —1 
E i > or RE = (1 E aE a i 


end do 
end do: set Py, = Firms 


end of computation of P,,, set: ee SS i 


end do 
for pı = 1,2,..., m1, do 
for qı = p1, pı + 1,..., Mı, do 
PP = PP Ren 
end do 
end do: set P(t1, t2) = Py 


9.9 Construction of Bézier Surfaces 


Exercise 9.3. 1. Choose (mı + 1) x (m2 + 1) points Poo, Pio,.--,Pmims 
in R3. 

2. Write a general procedure generating the point P(t1,t2) of the Bézier 
patch with control points Po,9, Pi,o,.--;Pm,,m:, using the de Boor—Coox 
algorithm for any value (t1,t2) € [0,1] x [0,1]. 

3. Compute and display the corresponding Bézier patch. 

4. Repeat the computation for different values of (m1, mz) and different sets 
of control points. 
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5. Check the different continuity conditions. 


A solution of this exercise is proposed in Sect. 9.10 at page 211. 


Exercise 9.4. (for the brave) Extend the method for computing the intersec- 
tion of two curves to compute the intersection of Bézier surfaces. Write and 
check the corresponding procedure. 


9.10 Solutions and Programs 


Solution of Exercise 9.1 


The file CAGD_ex1.m contains the procedure CAGD_ex1, which defines a set of 
control points, and builds and displays the resulting Bézier curve, as shown in 
Fig. 9.1(a). This procedure calls the function CAGD_cbezier, which computes 
a set of sampling points, and the function CAGD_casteljau, which builds a 
point according to the de Casteljau’s algorithm (9.6). 


Listing 9.1 (CAGD_casteljau.m) 


function [x,y]=CAGD_casteljau(t, XP, YP) 

hh 

hh Construction of a point of a Bezier curve 
hh according to de Casteljau’s algorithm 
hh 

m=size(XP,2)-1; 

xx=XP; yy=YP; 

for kk=1:m 

XXX=XX; YYY=YY; 

for k=kk:m 

xx (k+1)=(1-t) *xxx (k) +t*xxx (k+1) : 

yy (k+1)=(1-t) *yyy (k) +t*yyy (k+1) ; 

end 

end 

x=xx(m+1) ; y=yy (m+1) ; 


The procedure CAGD_ex1 then calls the function CAGD_tbezier, which dis- 
plays the Bézier curve and its control points. The control polygon, as shown 
in Fig. 9.2, is available for calling procedures CAGD_ex1b and CAGD_pbezier. 
Exchanging the points P} and Ps will have as result the generation of a dif- 
ferent curve, with a nonconvex control polygon (see procedure CAGD exic). 


Procedures CAGD_connectCCO, CAGD_connectCGi, and CAGD_connectCCi 
plot examples of C°, Gt, and Ct continuity. 
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Solution of Exercise 9.2 


The procedure CAGD_ex2 defines two sets of control points, and builds and dis- 
plays both corresponding Bézier curves. Each curve is then located within a 
rectangle by the function CAGD_drectan. A possible intersection of these rect- 
angles is then tested by the procedure CAGD_rbezier. When this is successful, 
the function CAGD_dbezier is used; this procedure is an implementation of al- 
gorithm (9.12), seeking iteratively the intersection of both curves. When the 
rectangular intersection is small enough, the coordinates of the intersection 
point are computed; then an approximation of the corresponding value of the 
parameter t is obtained from (9.13) and (9.14). 

Finally, the procedure calls the function CAGD_cbezier, which computes a 
sampling of points of a Bézier curve, using the functions CAGD_casteljau and 
CAGD_tbezier. The resulting curve and control points are then displayed. 


Solution of Exercise 9.3 


The procedure CAGD_ex3 defines a set of control points, and builds and displays 
the corresponding Bézier patch, as represented in Fig. 9.8 (you can use the 
“Rotate 3D” button to obtain a global view of the surface). This procedure 
calls the functions CAGD_sbezier, which gives a sampling of points of a Bézier 
surface; CAGD_coox; and CAGD_ubezier, which finally displays both the surface 
and its control points. The procedure CAGD_coox builds one point of a Bézier 
surface according to the de Boor—Coox algorithm (9.19). 


Listing 9.2 (CAGD_coox.m) 





function [x,y,z]=CAGD_coox(t1,t2,XP,YP,ZP) 

hh 

hh Construction of a point on a Bzier surface 

hh according to the de Boor--Coox algorithm 

hh 

npi=size(XP,1);np2=size(XP,2); 
xxi=zeros(npi,1);yyi=zeros(npi,1);zz1=Zzeros(npi,1); 
for ki=1:npl 

xx2=zeros(np2,1) ;yy2=zeros(np2,1);zz2=zeros(np2,1); 
for k2=1:np2 

xx2 (k2)=XP (k1,k2) ; yy2(k2)=YP(k1,k2) ;zz2(k2)=ZP(k1,k2) ; 
end 

[x,y,z]=CAGD\_cast3d(t2,xx2,yy2,zz2) ; 

xxi (k1)=x; yy1(k1)=y;2z1(k1)=z; 

end 

[x,y,z]=CAGD\_cast3d(t1,xx1,yy1,zz1); 
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The procedures CAGD_connectSCO, CAGD_connectSG1, and CAGD_connectSC1 
display examples of C? (respectively G and C+) and to 9.10. The obtained re- 
sults correspond to Figs. 9.9 and 9.10. 
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Gas Dynamics: The Riemann Problem and 
Discontinuous Solutions: Application to the 
Shock Tube Problem 


Project Summary 


Level of difficulty: 3 


Keywords: Nonlinear hyperbolic systems, Euler equations for gas 
dynamics, centered schemes: Lax—Wendroff, MacCor- 
mack; upwind schemes: Godunov, Roe 


Application fields: Shock tube, supersonic flows 


The interest in studying the shock tube problem is threefold. From a fun- 
damental point of view, it offers an interesting framework to introduce some 
basic notions about nonlinear hyperbolic systems of partial differential equa- 
tions (PDEs). From a numerical point of view, this problem constitutes, since 
the exact solution is known, an inevitable and difficult test case for any numer- 
ical method dealing with discontinuous solutions. Finally, there is a practical 
interest, since this model is used to describe real shock tube experimental 
devices.! 


10.1 Physical Description of the Shock Tube Problem 


The fundamental idea of the shock tube is the following: consider a long 
one-dimensional (1D) tube, closed at its ends and divided into two equal 
regions by a thin diaphragm (see Fig. 10.1). Each region is filled with the 
same gas, but with different thermodynamic parameters (pressure, density, 
and temperature). The region with the highest pressure is called the driven 


' The first shock tube facility was built in 1899 by Paul Vieille to study the defla- 
gration of explosive charges. Nowadays, shock tubes are currently used as low-cost 
high-speed wind tunnels, in which a wide variety of aerodynamic or aeroballistic 
topics are studied: supersonic aircraft flight, gun performance, asteroid impacts, 
shuttle atmospheric entry, etc. 
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section of the tube, while the low-pressure part is the working section. The 
gas being initially at rest, the sudden breakdown of the diaphragm generates 
a high-speed flow, which propagates in the working section (this is the place 
where the model of a free-flying object, such as a supersonic aircraft, will be 
placed). 











driven section working section 
t=0 
P T U,=0 Pe Ty U,=0 
expansion wave shock wave 
t>0 




















contact discontinuity 











Fig. 10.1. Sketch of the initial configuration of the shock tube (t = 0) and waves 
propagating in the tube after the diaphragm breakdown (t > 0). 


Let us get into a more detailed analysis of the problem. Consider (Fig. 10.1) 
that the left part of the tube is the driven section, defined by the pressure pz, 
the density pz, the temperature Tr, and the initial velocity Uz = 0. Similarly, 
the parameters of the (right part) working section are pr < pr, pr, Tr, and 
Ur =0. 

At time t = 0 the diaphragm breaks, generating a process that naturally 
tends to equalize the pressure in the tube. The gas at high pressure expands 
through an expansion (or rarefaction) wave and flows into the working section, 
pushing the gas of this part. The rarefaction is a continuous process and takes 
place inside a well-defined region (the expansion fan) that propagates to the 
left (region (E) in Fig. 10.1); the width of the expansion fan grows in time. 

The compression of the low-pressure gas generates a shock wave propagat- 
ing to the right. The expanded gas is separated from the compressed gas by a 
contact discontinuity, which can be regarded as a fictitious membrane travel- 
ing to the right at constant speed. At this point of our simplified description, 
we just note that some of the physical functions defining the flow in the tube 
(p(x), p(x), T(x), and U(x)) are discontinuous across the shock wave and the 
contact discontinuity. These discontinuities, which cause the difficulty of the 
problem, will be described in greater detail in the following sections. 
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10.2 Euler Equations of Gas Dynamics 


To simplify the mathematical description of the shock tube problem we con- 
sider an infinitely long tube (to avoid reflections at the tube ends) and neglect 
viscous effects in the flow. We also suppose that the diaphragm is completely 
removed from the flow at t = 0. Under these simplifying hypotheses, the 
compressible flow in the shock tube is described by the one-dimensional (1D) 
Euler system of PDEs (see, for instance, Hirsch, 1988; LeVeque, 1992) 


a [ P pU 
ra el le pU? +p | =0, (10.1) 
E apy 
W (x,t) F(W) 


where p is the density of the gas and E the total energy: 
p P 772 
E = —— 4 US, 10.2 

z eon! j 2 ( ) 
To close this system of equations, we need to write the constitutive law of the 
gas (or equation of state). Considering the perfect gas model, the equation of 
state is 

p = pRT. (10.3) 





The constants R and y characterize the thermodynamic properties of the gas 
(R is the universal gas constant divided by the molecular mass and y is the 
ratio of specific heat coefficients). It is also useful to define the local speed of 
sound a, the Mach number M, and the total enthalpy H: 


U E 2 1 
a = VIRT = 1", M =~, Fo A +5U%, (104) 


P ss 





Considering the column vector of unknowns W = (p, pU, E)', the Euler 
system of equations (10.1) can be written in the following conservative form: 


oW O 
— + —F(W)= 10. 
Ot Ox (W) = 0, (10.5) 


with the initial condition (we denote by xo the abscissa of the diaphragm): 


PL, PLUL, EL > T < TO; 
wene t ) (10.6) 


(pr, pRUR, ER), T > To. 


The vector W contains the conserved variables and F'(W ) the conserved fluxes. 
Note that with this choice of the vector of unknowns W, the pressure is not 
an unknown, since it can be derived from (10.2) using the components of W. 
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The mathematical analysis of the Euler system of PDEs usually considers 
its quasilinear form: 


OW OW 
ea () 10.7 
Ot i Ox i ( ) 
with the Jacobian matrix 
0 1 0 
OF 
A=—= P(A SU 3 — y)U Sdo 10.8 
IV (V3) (3 — 7) y (10.8) 


GRUS UE AC 


It is interesting to note that the matrix A satisfies the following remarkable 
relationship: 


AW = F(W). (10.9) 


Furthermore, we can easily calculate its eigenvalues 
WM =U, AT=U+a, AN =U-a, (10.10) 


and the corresponding eigenvectors 


1 1 1 
v=| U |, vt=| U+a |, =| U-a |. (10.11) 
SU? H + aU H — aU 


We conclude that the Jacobian matrix A is diagonalizable, i.e., it can þe 
decomposed as A = PAP", where 


U—a 0 0 1 1 1 
AE 0 U 0 E = U-—a U U+a |. (10.12) 
0 0 U+a H —aU ŁU? H+aU 
We can easily verify that 
alata) alU +) F 
po = 1 — Q1 QU — Q2 ; (10.13) 


where ay = (y — 1)U?/(2a?) and ag = (y — 1)/a?. 


Definition 10.1. The system (10.7) with the matrix A diagonalizable with 
real eigenvalues is called hyperbolic. 


The hyperbolic character of the system (10.7) has important consequences 
on the propagation of the information in the flow field. Certain quantities, 


2? The reader who has already explored Chap. 1 of this book may notice that this 
form is similar to that of the convection equation. The underlying idea is here to 
generalize the analysis of characteristics in the case of a system of PDEs. 
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called invariants,’ are transported along particular curves in the plane (x,t), 
called characteristics. From a numerical point of view, this suggests a simple 
way to calculate the solution in any point P(x,t) by gathering all the in- 
formation transported through the characteristics starting from P and going 
back to regions where the solution is already known (imposed by the initial 
condition, for example). 

The general form of the equation defining a characteristic is dx/dt = A, 
where À is an eigenvalue of the Jacobian matrix À. Since the corresponding 
invariant r is constant along the characteristic, it satisfies 


dr ór Or dx Or Or 
di ot ” OBR Oe a) 
In the simplest case of the convection equation Ou/Ot + cOu/dx = 0, with 
constant transport velocity c, there exists a single characteristic curve, which 
is the line x = ct, and the corresponding invariant is the solution itself, r = u 
(see also Chap. 1). From (10.10) we infer that the system (10.7) has three 
distinct characteristics: 

dx dx dx 

Cen NC ere CT: — =U — a. 10.15 

Œ Re, dt s ore 
The invariants can be generally expressed as differential relations (see, for 
instance, Hirsch (1988); Godlewski and Raviart (1996) for details) 


dr? =dp—a*dp=0, dr? =dp+padU=0, dr =dp—padU = 0, 


which have to be integrated along the corresponding characteristic curves. In 
the case of an isentropic flow* we obtain 


2a 
a. 





2 
r? = p/P, °° =U a M DE (10.16) 





The above relations will be used in the following to derive the exact solution 
of the shock tube problem. 


Definition 10.2. The nonlinear hyperbolic system of PDEs (10.5) and piece- 
wise constant initial condition (10.6) define the Riemann problem. 


3 For a rigorous analysis of hyperbolic systems of PDEs and related definitions (in 
particular the definition of Riemann invariants), the reader can refer to Hirsch 
(1988); LeVeque (1992); Godlewski and Raviart (1996). 

“ The entropy variation of a perfect gas during its evolution starting from a ref- 
erence state A is s — sa = Cyln (245); where Cy = R/(y — 1) is the heat 
coefficient under constant volume. For an isentropic flow, since the entropy re- 
mains unchanged (s = sa), we deduce that the ratio p/p” is constant. 
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10.2.1 Dimensionless Equations 


When building numerical applications we usually prefer to remove physical 
units from equations and work with dimensionless variables. This simplifies the 
problem formulation and may reduce computational round-off errors. Physical 
variables in previous equations are nondimensionalized (or scaled) using a 
reference state defined by the parameters of the working section: 


p =p/pr, U* =U/ar, a* = a/ar, T° =T/(7TR), 
p* =p/(prag) = p/(ypr), E* = E/(prap), H* = H/ap. (10.17) 


We also nondimensionalize space and time variables as x* = x/L, t* = 
t/(L/ar), where L is the length of the tube. 

The Euler equations for the dimensionless variables (denoted by the star 
superscript) keep the same differential form as previously: 








ve p*U* +3 = U +p | =0. (10.18) 
E* T (E* ae p*)U* 


Dimensionless total energy /* and total enthalpy H* become 


* p* p* x2 * EF 1 x 2 
B S —U A? = ab, 10.19 
ao à E (10.19) 








Differences with respect to previous physical equations appear in the equation 
of state 
papi. (10.20) 


and in the definition of the speed of sound 


a* = /y = VAT. (10.21) 
p 


In the interests of simplicity, we drop the star superscript in subsequent equa- 
tions; only dimensionless variables will be considered in the following sections. 


10.2.2 Exact Solution 


The exact solution of the shock tube problem follows the physical and math- 
ematical descriptions given in previous sections. The tube is separated (see 
Fig. 10.1) into four uniform regions, i.e., with constant parameters (pressure, 
density, temperature, and velocity): the left (L) and right (R) regions (which 
keep the parameters imposed by the initial condition) and two intermediate 
regions, denoted by subscripts 1 and 2. 
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It is important to identify these regions in the (x,t) plane (see Fig. 10.2). 
All the waves are centered at the initial position of the diaphragm (t = 0,x = 
xo). Since the shock and the contact discontinuity propagate in uniform zones, 
they have constant velocities and hence are displayed as lines in the (x,t) di- 
agram. The expansion wave extends through the new zone (E), the expansion 
fan, in which the flow parameters vary continuously (see below). We remem- 
ber that the shock wave and the contact discontinuity propagate to the right, 
while the expansion fan moves to the left. 





expansion contact shock 
discontinuity wave 
a 


(R) 








Fig. 10.2. Diagram in the (x,t) plane of the exact solution of the shock tube 
problem (left). Characteristics used to calculate the exact solution (right). 


We start the calculation of the exact solution by writing the dimensionless 
parameters of the (L) and (R) regions (which are in fact the input parameters 
for a computer program): 


Region (R): pr 1, pr = 1/7, Tr = 1/7, ar = 1, Ur = 0, (10.22) 
Region (L): pz, pL, TL, ar, Ur = 0 (given quantities). (10.23) 


We then use the jump relations across the discontinuities and take into account 
the propagation of the information along the characteristics, as follows: 


1. The shock wave implies the discontinuity of all the parameters of the 
flow. The jump between regions (1) and (R) is described by the Rankine- 
Hugoniot relations (see, for example, Hirsch (1988)): 





Di ns fag 
pE GFE A gpk 
pr 2 1 pe 
W MEN EU 


2 1 
U = — | Me À 10.26 
= ( a (\ 


where M, is the Mach number of the shock, defined in physical units as 
M, = OE a aes We note that using our scaling, M, = Us, where U, is 
the dimensionless propagation speed of the shock. We remember that U, 
is constant. 


(10.24) 


(10.25) 
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The contact discontinuity is in fact a discontinuity of the density function, 
the pressure and the velocity being continuous. Hence 


U2 = Ui, Pa = pi. (10.27) 


We now link the parameters of region (2) to those of region (L). For 
this purpose, we consider a point P inside the region (2) and draw the 
characteristics passing through this point (see Fig. 10.2). We notice that 
only CŸ and C* characteristics will cross the expansion fan to search the 
information in region (L). Using the expressions (10.15) for the invariants 
r° and r* and taking into account that Ur, = 0, we obtain 





1/7 
p2 p2 2 
— = | — y U= ar — a). 10.28 
PL a t g= i! bee (\ 





. Finally, we combine the previous relations to obtain an implicit equation 


for the unknown Ms. The detailed calculation follows: 


Vee. ee ey 
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HOOT) sate. (028) ated ( i 
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Since 


ay (10.21) (2 2h (10.28) (22\* (10.27) (m\* 
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we replace pı/pr from (10.24) and finally get the following compatibility 
equation: 


y — 1 
1 1 2 —1\| 2 
ey na 2 (Tu = —)| . (10.29) 
M, y— I pL NY nn 
Once this implicit nonlinear equation is solved (using an iterative Newton 


method, for example), the value of M, will be used in previous relations 
to determine all the parameters of uniform regions (1) and (2). 








To complete the exact solution, we need to determine the extent of each region 


(i.e 





., calculate the values of the abscissas 71, £2, £3, £4 in Fig. 10.2) for a given 


time value t. We proceed as follows: 


The expansion fan (E) is left-bounded by the C7 characteristic starting 
from the point B, considered to belong to region (L), i.e., the line of 
slope dx/dt = —az. The right bound of the expansion fan is the C7 
characteristic starting from the same point B, but considered this time to 
belong to region (2), i.e., the line of slope dx/dt = U> — a2. The values of 
x, and zz are consequently 


= Lo — apt, ts = T0 + (U2 = az )t. (10.30) 
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Consider now a point (x,t) inside the region (E), i.e., 71 < £z < xg. Since 
this point belongs to a C~ characteristic starting from B, necessarily (x — 
xo)/t = U — a. Using the CT characteristic coming from region (L), we 
also get that a + (y — 1)U/2 = az. Combining these two relations and 
remembering that the flow is isentropic, we can conclude that the exact 
solution inside the expansion fan is 





2 T — Xo U Qo \ tS 
U = —— = Na nm | 
(a+ i Jes de re: p. (2) 
(10.31) 


e The contact discontinuity is transported at constant velocity Uz = U1, so 
3 = Lo + Ut. (10.32) 


e Since the shock wave also propagates at constant dimensionless velocity 
U. = M,, we finally obtain 


LA = TD F Mit. (10.33) 


Remark 10.1. The exact solution W (x,t) of the shock tube problem depends 
only on the ratio x/t, as one would have expected from the characteristics 
analysis of the Euler system of PDEs. 





Exercise 10.1. Write a MATLAB function to compute the exact solution 
of the shock tube problem. The definition header of the function will be as 
follows: 


function uex=HYP shock_exact(x,x0,t) 

Input arguments: 

h x vector of abscissas of dimension M 

if xO the initial position of the diaphragm 

h t time at which the solution is calculated 

Output arguments: 

uex vector of dimensions (3,n) containing the solution as 
uex(1,1:M) the density 

uex(2,1:M) the velocity 

uex(3,1:M) the pressure 


sk ok Sk SS 


Plot the dimensionless exact solution (p(x), U(x) and p(x)) at time t = 0.2. 
Consider x € [0,1], zo = 0.5, and a regular (equidistant) grid with M = 81 
computational points. The physical parameters correspond to those used by 
Sod (see also Hirsch, 1988): y = 1.4, pr = 8, pr = 10/7. 

Hint: define all the physical parameters as global variables; use the MAT- 
LAB built-in function fzero to solve the compatibility equation (10.29). 





The expected result is displayed in Fig. 10.3. This solution was obtained 
using the MATLAB program presented in Sect. 10.4 at page 232. 
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Fig. 10.3. Exact solution of the shock tube problem (Sod’s data) at time t = 0.2. 


10.3 Numerical Solution 


The first idea one would have in mind when attempting to numerically solve 
the Euler system of PDEs (10.18) is to use elementary discretization methods 
discussed in Chap. 1 for scalar PDEs, for example, an Euler or a Runge-Kutta 
method for the time integration and centered finite differences for the space 
discretization. We shall see, however, that such methods are not appropriate 
to compute discontinuous solutions, since they generate nonphysical oscilla- 
tions. This drawback of the space-centered schemes for computing the shock 
tube problem will be illustrated using the more sophisticated Lax-Wendroff 
and MacCormack schemes. We shall also give a quick description of upwind 
schemes that take into account the hyperbolic character of the system and 
allow a better numerical solution. Results using Roe’s upwind scheme will be 
discussed at the end. 








10.3.1 Lax—Wendroff and MacCormack Centered Schemes 








The space-centered schemes were historically the first to be derived to solve 
hyperbolic systems. The two most popular schemes, the Lax and Wendroff 
scheme and the MacCormack scheme, are still used in some industrial numer- 
ical codes. We shall apply these schemes to solve the Euler system (10.18) 
written in the conservative form 


aw à 
— + — Ff = 0. 10.34 
a +3 FUN) = 0 (10.34) 


We use a regular (or equidistant) discretization of the domain of definition of 
the problem (x,t) € [0,1] x [0, T]: 


e in space 





1 
: = (1 E 6x = m2 M 10.35 
Tj (5 )ox, X M — 1? J 9 49 ’ ’ ( ) 
e and in time 
T 
t"=(n—1)ét, =, n=1,2,...,N. (10.36) 
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For both schemes, the numerical solution we (at time t,41 and space posi- 
tion x;) is computed in two steps (a predictor and a corrector step) following 
the formulas displayed in Fig. 10.4. 


Lax-Wendroff 


2 WP +W? ôt à z 
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Fig. 10.4. Formulas of Lax-Wendroff and MacCormack space-centered schemes. 
Schematic representation of their predictor and corrector steps. 


We discuss in the following some remarkable features of these schemes. 

1. (Boundary values.) From the schematic representation of the predictor 
and corrector steps in Fig. 10.4, we notice that only the components j = 
2,...,(M — 1) of the solution are calculated. The remaining components for 
j = 1 andj = M need to be prescribed by appropriate boundary conditions. 
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Since the tube is assumed infinite, we impose W? = Wz and Wy), = WR at 
any time level t,,. Practically, this is equivalent to leaving unchanged the first 
and last components of the solution vector. Meanwhile, it is obvious that the 
computation must stop before one of the waves (expansion or shock) hits the 
boundary. 

2. (Propagation of information.) The predictor step of the Lax—Wendroff 
scheme computes an intermediate solution at interfaces (j + 5) and (5 — 5) 
using forward finite differences. These intermediate values are then used in 
the centered finite difference scheme of the corrector step. 

The MacCormack scheme combines backward differences for the predictor 
step with forward differences for the corrector step. We can show in fact that 
the idea behind this scheme is the following Taylor expansion: 











OW 
n+l n 
wrt =W}; + + ot, (10.37) 
where 
OW\ _1/ (ow \" (aw) | _1|W;-W; F(W;)-F(W;1) 
ðt j; 2 CPE ot | | 2 ôt Ôx 
J 


is an approximation of the first derivative in time. 

In conclusion, the information is searched on both sides of the computed 
point 7. The information propagation along characteristics is not taken into 
account, since no distinction is made between upstream and downstream in- 
fluences. We shall see that this lack of physics in the numerical schemes will 
generate unwanted (nonphysical) oscillations of the solution. 

3. (Accuracy.) Both schemes use a three-point stencil (7 — 1,7,7 + 1) to 
reach second-order accuracy in time and space. 

4. (Stability.) Both schemes are explicit and consequently subject to sta- 
bility conditions. Similar to the (scalar) convection equation (see Chap. 1), 
we can write the stability (or CFL°) condition in the general form 


ot 
max{|As[} < 1 





where À;, à = 1,2,3, are the eigenvalues of the Jacobian matrix OF /OW, 
regarded here as propagation speeds of the corresponding characteristic waves 
(dx/dt = À). Using (10.15), we obtain the stability condition 


ot 
U — < 1. 10.38 
(U| +a) = < (10.38) 
For numerical applications, this condition is used to compute the time step 
Ox | 
ôt = cl. ——_,,_ with cfl< 1. (10.39) 
IU} + a 


5 Courant-Friedrichs-Lewy 
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Exercise 10.2. For the same physical and numerical parameters as in the 
previous exercise, compute the numerical solution of the shock tube problem 
at t = 0.2 using Lax—Wendroff and MacCormack centered schemes. Compare 
to the exact solution and comment on the results. Hints: 

e set an array w(1:3,1:M) to store the discrete values of the vector W = 
(p, pU, E)* of conservative variables; 

e using (10.39) with cfl = 0.95, compute the time step in a separate function 
function dt = HYP_calc_dt(w,dx,cfl); 

e write a function to compute F(W); 

e use vectorial programming to translate the formulas in Fig. 10.4 into MAT- 
LAB program lines (avoid loops!); for example, the predictor step of the Lax- 
Wendroff scheme will be coded in a single line: 


wtilde=0.5*x(w(:,1:M-1)+w(:,2:M))-0.5*xdt/dx*x(F(:,2:M)-F(:,1:M-1) ) ; 


e for each scheme, superimpose numerical and exact solutions for (p, U,p) as 
in Fig. 10.5. 
A solution of this exercise is proposed in Sect. 10.4 at page 232. 


The numerical results of both schemes, displayed in Fig. 10.5, show good 
accuracy in smooth regions, whereas unwanted oscillations appear at the in- 
terfaces between different regions of the solution. The contact discontinuity is 
also poorly captured. The MacCormack scheme seems to capture the shock 
discontinuity better, but introduces higher-amplitude oscillations at the end 
of the expansion wave where the flow is strongly accelerated. 
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Fig. 10.5. Numerical results for the shock tube problem (Sod’s parameters) using 
centered schemes. Lax—Wendroff scheme (up) and MacCormack scheme (down). 
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Artificial Dissipation 


The oscillations generated by the centered schemes around discontinuities can 
be damped by adding a supplementary term to the initial equation (10.34): 


ow + 2 rw) — ja 2 Cone =Q. (10.40) 
The mathematical form of this term is inspired by the heat equation (dis- 
cussed in Chap. 1). The idea is to simulate the effects of a physical dissipation 
(or diffusion) process which is well known to have a smoothing effect? on the 
solution. Since the dissipation term is proportional to the gradient 0W/0x 
of the solution, the smoothing will be important in regions with sharp gradi- 
ents (as the shock discontinuity) where numerical oscillations are expected to 
disappear. 

The coefficient D(x), also called artificial viscosity by analogy with Navier— 
Stokes equations (see Chap. 12), has to be positive to ensure a stabilizing 
effect’ on the numerical solution. Moreover, its value has to be chosen such 
that the influence of the artificial term is negligible (i.e., of an order greater 
than or equal to the truncation error) in the smooth regions of the solution. 

Several methods have been proposed to prescribe the artificial viscosity 
D(x) and to modify classical centered schemes accordingly (see, for instance, 
Hirsch (1988), Fletcher (1991)). We illustrate the simplest technique, which 
considers a constant coefficient D(x) = D and writes (10.40) in the conserva- 
tive form (10.34) with a modified flux F*(W): 








— + — F*(W)=0, where F*(W) = F(W) - Dr D (10.41) 
T 


In order to use the same three-points stencil to define the schemes, the new 
vector F*(W) will be discretized 


e using backward differences in the predictor step 
F°(W;) = F(W;) — (Dox)(W; — W;-1), (10.42) 


e and forward differences in the corrector step 


~ ~ 


F*(W;) = F(W;) — (Déx) (W341 — W;). (10.43) 


Exercise 10.3. Modify the previous program by adding an artificial dissi- 
pation term to both Lax—Wendroff and MacCormack schemes. Use (10.42)- 
(10.43) to modify the flux F(W). Discuss the effect of the value of the artificial 
viscosity D (take 0 < D < 10). What is the influence of D on the value of the 
time step? 


6 This smoothing effect is nicely illustrated for the heat equation in Chap. 1, Ex- 
ercise 1.10. 
7 The heat equation with negative diffusivity has physically unstable solutions! 
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The results obtained with an artificial dissipation term are displayed in 
Fig. 10.6. Numerical oscillations are reduced near the shock and expansion 
waves, but large dissipation is also introduced in other regions of the solution. 
In particular, the contact discontinuity (see the graph for p(x)) is consider- 
ably smeared. Increasing the value of D allows one to completely remove the 
oscillations, but the overall accuracy is not satisfactory. More sophisticated 
methods have been proposed (see the references at the end of the chapter) to 
render the dissipation more selective with respect to the nature of disconti- 
nuities, but the general tradeoff between damping the oscillations and overall 
accuracy suggests that the artificial dissipation does not bring a real solution 
to the problem. A different approach, including more physics in the numerical 
approximation, is presented in the next section. 
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Fig. 10.6. Numerical results for the shock tube problem (Sod’s parameters) using 
centered schemes with artificial dissipation. Lax-Wendroff scheme (up) and Mac- 
Cormack scheme (down). 


10.3.2 Upwind Schemes (Roe’s Approximate Solver) 


The origin of the numerical oscillations generated by the centered schemes 
discussed in the previous section comes from complete ignorance of the hy- 
perbolic character of the Euler system of PDEs, in particular the propagation 
of the information along characteristics. These important (physical) features 
will be considered in deriving upwind schemes. 

Physical information can be introduced at different levels of the numerical 
approximation. We distinguish between: 








228 10 Riemann problem and discontinuities: the shock tube problem 


1. flux splitting upwind schemes, which use different directional discretiza- 
tion of the flux F(W), depending on the sign of the eigenvalues À of the 
Jacobian matrix (10.8); since À corresponds to the propagation speed of 
the associated characteristic, these schemes include only the information 
on the direction of propagation of waves (up- or downstream); 

2. Godunov-type schemes, which introduce a higher level of physical approx- 
imation by considering a discretization based on the exact solution of the 
Riemann problem at each interface between computational points; when 
the local Riemann problem is solved approximatively, we talk about Rie- 
mann solvers. 


The following sections present the basic principle of Godunov schemes and 
the Riemann approximate solver of Roe. 


Godunov-Type Schemes 


The basic principle of a Godunov-type scheme is the following: the solution 
W?” is considered to be piecewise constant over each grid cell defined as the 
interval ls /2) Tj+1 2| : this allows us to define locally a Riemann problem 
at each interface between the cells; each local Riemann problem is solved 
exactly to calculate the solution W"*! at the next time level. 
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Fig. 10.7. Principle of a Godunov-type scheme. 


More precisely, the numerical solution is advanced from time level t,, = nôt 
to tn41 = tn + ôt in three steps (see Fig. 10.7): 





Step 1. Using the known values We, define the piecewise constant function 


W'(z)=W}, xej —1/2)dx, (j +1/2)dz[. (10.44) 
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Step 2. Calculate the solution function W"t!(x), x € ](j — 1/2)6x, (j + 
1/2)0x | by gathering the exact solutions of the two Riemann problems 
defined at interfaces ( j — >) and ( j+ >). This step requires that the waves 
issued from the two neighboring Riemann problems not intersect. This 
implies that the time step should be limited such that 


ot 1 


max (|U| +a a)j41/2 5x 5: (10.45) 


Step 3. Obtain the solution W#1(x), which is also a piecewise constant 
function, by averaging W”+!(x) over each cell: 


1 (j+1/2)0x _ 
We W"+1(x)dx. (10.46) 
Í Ox (j—1/2)6x 


We can show that the Godunov scheme can be written in the following 
conservative form: 


wert —Wwe 4 SWP Wha) — PWF, Wha) 
ot Ox 


where the flux vector is generally defined as 


= 0, (10.47) 


DWP, WH) = FW Hj): (10.48) 


The advantage of the conservative form is that it is valid over the entire domain 
of definition of the problem, even though the solution is discontinuous. This 
form is also used to derive approximate Riemann solvers. The exact form of 
the flux vector will be presented in the next section for the Roe solver. 





Roe’s Approximate Solver 


The approximate solver of Roe is based on a simple and ingenious idea: the 
Riemann problem (10.7) at interface ( TF 5) is replaced by the linear Riemann 
problem 

OW ow 


wr 
De oe oe 


jr ©S (G+ 5) 6a 


= 0, Wi(x,nôt) = j 
SL no > (j+4) on 


(10.49) 


The first question raised by this approach is how to properly define the matrix 
Aj+41/2, which depends on We and W;'.,. This matrix is a priori chosen such 
that: 


1. The hyperbolic character of the initial equation is conserved by the linear 
problem; hence A,,1/2 admits a decomposition similar to (10.12): 


Aj41/2 = Pj+1/2 Aj41/2 terre (10.50) 


In order to take into account the sign of the propagation speed of charac- 
teristic waves, it is useful to define the matrices following: 
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e sign(Aj4i1/2) = Pj41/2 (sign(A)) Pi jo, where sign(A) is the di- 
agonal matrix defined by the signs of the eigenvalues A: sign(A) = 
diag(sign.,). 
© |Aj41/2l = Piziy2l APh j2 where |A] = diag(| Ml). 
2. ‘The linear Riemann problem is consistent with the initial problem, i.e., 
for all variables u, 


Aj41/2(U, u) = A(u, u). (10.51) 
3. The numerical scheme is conservative, i.e., for all variables u and v, 
F(u) — F(v) = Aj4ija(u, v)(u— v). (10.52) 


For the practical calculation of the matrix A;j+1/2, the original idea of Roe 
was to express the conservative variables W and conservative fluxes F(W) 
in (10.18) as quadratic forms of the components of the column vector Z = 


/e(l, U, HY = (z1, 22; za)": 


2 
Zi : 2122 i 
W = ziz , BW) =| Taata |: (10.53) 
1 Vat 
RIZ Z 
y Pena 27 C2 Z223 


Using the following identity, valid for arbitrary quadratic functions f, g, 


n 2 _ ne; + fi 
(fair — (f9)5 = F(gi+1 — 95) +91 — fj), where f= A t 
we can find two matrices B and C such that 
F(W;41) — F(W;) EU a=): 
This implies that 
F(W;41) — F(W;) = (CB )(Win — Wj), (10.55) 


which corresponds exactly to (10.52). Consequently, a natural choice for the 
matrix Aj41/2 will be 
Ay apo = C B=. (10.56) 


A remarkable property of this matrix (the reader is invited to derive it as an 
exercise!) is that it can be calculated from (10.8) by replacing the variables 
(p,U, H) with the corresponding Roe’s averages 


= Beijos = Ry 41/2Hj41 + A; 
5. = Ts Bo oe RP © CE. He nr Ar Re 
Pj+1/2 j4+1/2Pj, YV j+1/2 EER reve j+1/2 eer i 


U? PET 
ne) ; where Ri+1/2 = ~ (10.57) 
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āj+1/2 = (y-1) (ae = 


10.3 Numerical Solution 231 


It is also remarkable that eigenvalue and eigenvector formulas (10.10) and 
(10.11) still apply to Aj41/2 if one uses the corresponding Roe’s averaged 
variables. This considerably simplifies the calculation of matrices sign(A;,1,2) 
and |Aj41/2|, which accounts for the popularity of Roe’s approximate solver. 

Once the matrix Aj4+1/2 is defined, the upwinding in Roe’s scheme follows 
the general principle of first-order upwind schemes applied to linear systems 
(see, for instance, Hirsch (1988) for more details). The flux in the general 
conservative form (10.47) becomes for Roe’s solver 


BWP, Wha) = 5 (FWP) + FW Pa) — sign Ajaya) PWR) — FOP}, 


J J 


(10.58) 
or, if we use (10.52), 
DW; j41) 5 {Fl (W) IPE S Al; 72W51 = wi} : (10.59) 
To summarize, Roe’s scheme will be used in the form 
n+1 n ot k n n 
with the flux @ given by (10.59); the matrix |A;j+1/2| = Pj41/2|A|P mere will 


be calculated using Roe’s averages (10.57) in (10.12) and (10.13). 
Remark 10.2. Roe’s scheme is first-order accurate in time and space. 


Exercise 10.4. Use Roe’s scheme (10.60) to solve numerically the shock tube 
problem (Sod’s parameters). Compare to the numerical results previously ob- 
tained using centered schemes. 


The results obtained using Roe’s scheme are displayed in Fig. 10.8. Com- 
pared to centered schemes, the numerical solution is smooth, without oscilla- 
tions. The shock wave is accurately and sharply captured, but the scheme 
proves too dissipative around the contact discontinuity, which is strongly 
smeared. 

More accurate Riemann solvers can be derived in the framework of 
Godunov-type schemes by increasing the space accuracy. For example, we 
can use piecewise linear functions in steps 1 and 3 of the Godunov scheme to 
obtain solvers of second order in space. Several other approaches have been 
proposed in the literature to include more physics in the numerical discretiza- 
tion, leading to other classes of numerical methods, including TVD (total 
variation diminishing) and ENO (essentially nonoscillatory) schemes, which 
are now currently used to solve hyperbolic systems of PDEs. The reader who 
wishes to pursue the study of upwind schemes beyond this introductory pre- 
sentation is referred to more specialized texts such as Fletcher (1991); Hirsch 
(1988); LeVeque (1992); Saad (1998). 
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Fig. 10.8. Numerical computation of the shock tube problem (Sod’s parameters) 
using Roe’s approximate solver. 


10.4 Solutions and Programs 


The exact solution of the shock tube problem for a given time value t is 
computed in the script HYP_shock_tube_exact.m. The compatibility relation 
(10.29) is implemented as an implicit function (i.e., f(x) = 0) in the script 
HY P_mach_compat.m; this function is used as the first argument of the MAT- 
LAB built-in function fzero to compute the root corresponding to the value of 
Ms. The final solution, containing the discrete values for (p, U, p), is computed 
according to relations in Sect. 10.2.2. Note the use of the MATLAB built-in 
function find to compute the abscissas x separating the different regions of 
the solution. 

The main program resulting from successively solving all the exercises of 
this project is HYP_shock_tube.m. After defining the input data (which are the 
parameters of regions (L) and (R)) as global variables, the space discretization 
is built and the solution is initialized using Sod’s parameters. Three main 
arrays are used for the computation: 





usol(1:3,1:M) to store the nonconservative variables (p, U, p)‘, 
w(1:3,1:M) for the conservative vector W = (p, pU, E}, 
and F(1:3,1:M) for the conservative fluxes F(W). 


The program allows one to choose among three numerical schemes: Lax- 
Wendroff, MacCormack, and Roe. When a centered scheme is selected, the 
value of the artificial dissipation is requested. The numerical solution is su- 
perimposed on the exact solution using the function HYP_plot_graph imple- 
mented in the script H YP_plot_graph.m. The most important functions called 
from the main program are: 


e HYP_trans_usol_w: computes W = (p, pU, E)’ from usol = (p, U, p)‘; 

e HYP_trans_w_usol: computes usol = (p, U, p)* from W = (p, pU, E)'; 

e HYP trans wf: computes F = (pU,pU? + p, (E + p}U)t from W = 
(p, pU, E); 

e HYP_calc_dt: computes ôt = cfl-0x/([U] + a) from W = (p, pU, EY. 


All these functions are written with a concern for transparency with respect 
to the mathematical formulas. For this purpose, the vectorial programming 





Chapter References 233 


capabilities of MATLAB were used. Let us explain in detail this technique for 
the predictor step of the Lax-Wendroff scheme (see Fig. 10.4): 
e the flux F(W) is computed from W values for all j = 1,..., M components 


F = HYP trans_w_f(w); 





e the artificial dissipation vector is added following (10.42); we use the MAT- 
LAB built-in function diff to compute differences W; — W;_1; these differ- 
ences are computed along the rows of the array w and only for j > 2; according 
to the left-boundary conditions, the artificial dissipation vector will be com- 
pleted by zeros for j = 1: 


F = F-Ddx*[zeros(3,1) diff (w,1,2)]; 


e the intermediate solution W is computed only for the components j = 
1,...,M — 1: 


wtilde=0.5*x(w(:,1:M-1)+w(:,2:M))-0.5*xdt/dx*x(F(:,2:M)-F(:,1:M-1) ) ; 


À similar MATLAB code will be written for the corrector step, having in 
mind that for this step, right-boundary conditions apply, and consequently, 
only the components j = 2,... M — 1 of W”*! are computed: 


Ftilde = HYP_trans_w_f(wtilde) ; 
Ftilde=Ftilde-Ddx* [diff (wtilde,1,2) zeros(3,1)]; 
w(: ,2:M-1)=w(: ,2:M-1) -dt/dx* (Ftilde(: ,2:M-1)-Ftilde(: ,1:M-2)); 


Particular attention was devoted to the implementation of Roe’s scheme, 
which requires a separate function HYP_flux_roe to compute the conservative 
flux P. In order to reduce memory storage, the flux at the interface (j + 5) 
is computed using this once (and once is not habit!) a for loop and several 
local variables that can be easily identified from mathematical relations. Note 
also that the analytical form (10.13) for Pa j2 Was used instead of the (time- 
consuming) MATLAB built-in function inv, which calculates the inverse of a 
matrix. 
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Thermal Engineering: Optimization of an 
Industrial Furnace 


Project Summary 


Level of difficulty: 2 


Keywords: Finite element method, Laplace differential operator, 
direct problem, inverse problem 


Application fields: Thermal engineering, optimization 


11.1 Introduction 


In this chapter we deal with a simple but realistic optimization problem. We 
have to find the optimal temperature of an industrial furnace in which are 
made resin pieces, such as car bumpers. The heating system is based on elec- 
tric resistances, and the first part of this study is to compute the temperature 
field inside the oven when the values of the resistances are known. This work 
is called the direct problem: the resistances’ values are known and the temper- 
ature field is unknown. It is important here to emphasize that the mechanical 
properties of the bumper depend on the temperature during the cooking; so 
the second part of the study is devoted to computing the resistances’ values 
in order to maintain the bumper temperature at the “good” value. This op- 
timization problem is called an inverse problem: the temperature is an input 
and the resistances values are outputs. 

The computation of the temperature field is performed with the finite 
element method. Only the main features of this method are recalled here; for 
further details we refer to Ciarlet (1978), Norrie and de Vries (1973), and 
Zienkiewicz (1971). 
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11.2 Formulation of the Problem 


For the sake of simplicity we limit the geometry of the problem to elementary 
shapes: the bumper is a rectangle placed in a rectangular domain {2 repre- 
senting the oven; the edges are referred to as the boundary 0492 (see Fig. 11.1). 
This boundary is the union of three nonempty parts: 0p, On, and 0NF, 
satisfying the following conditions: 


O2 = 00h UONNU0Nr and 00h NONN = ONp NODE = O2QrNOQN = (). 


The partial differential equation arising from the heat diffusion phe- 
nomenon in the oven can be written as 


Find T € V such that 
(11.1) 


div |-K grad T! =F mneh. 


For physical and mathematical reasons, the temperature field has to satisfy 
some conditions on the wall of the oven. First we impose T = Tp on 025; this 
is commonly referred to as a Dirichlet boundary condition. Another condition 
rules the thermal flux across 0f2x. This is referred to as a Neumann boundary 
condition. A last condition is devoted to the temperature balance between the 
inside and the outside of the oven. This Fourier boundary condition states 
that the heat transfer through 0{?r is proportional to T — Tp, the difference 
between internal and external temperatures. 

All these arguments are translated into mathematical terms, and we add 
them to the formulation of problem (11.1). They are summarized in 


= Tp on oNp, 
Ka OL f on OQ 
DT tJ Ox; a on N, (11.2) 
OT 
ar. K ij T g(T — Tr) on 0p. 
J 





We have employed here the following notations: 


1. T is the temperature in the domain £. 

. V is the set of all feasible temperatures. 

K € R?*? is the thermal conductivity tensor. In a homogeneous isotropic 
medium, we have K = cl, where c is the heat conductivity coefficient, 
and Ís is the identity matrix. 

. The volume and surface heat sources are denoted by F and f. 

. The ambient temperature (inside the oven) is set to the value Tp on Op. 
. The outside temperature is set to the value Tp on Of). 

. g is the heat transfer coefficient on O12 r. 

v = (1,V2)* is the outward normal vector on 02. 


w N 


COND HR 
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Fig. 11.1. Object and oven. 





For the sake of simplicity we assume here perfect thermal insulation, that 
is, f = 0 and g = 0. All data necessary to deal with the problem are now 
determined, and we use a Green formula, 


YT, T' €V, J div |-K grad T! T'dz 
NQ 
ƏT ƏT" 7 
= S A 0 -X fa £ Ta Vi ds. 
tJ 


Let us introduce now the subspace V° € V by V° = {T’ € V,T' |50,= 0}, 
and write the variational formulation of problem (11.1): 


Find T € V? + Tp such that 


VT’ eV’ Sji (grad T’)' K grad T dx = D |T F de (11.3) 


KETh KETh 


It has been proved that problem (11.3) is equivalent to problem (11.1)+(11.2) 
and has a unique solution T (see Ciarlet, 1978). 


11.3 Finite Element Discretization 


In a real case, the physical data of problem (11.3) are provided by experimen- 
tal measures; they are not trivial and neither is the geometry of the domain 
(2. Consequently, it is not possible to write an explicit solution T'(x,y) of 
problem (11.1)+(11.2). This solution is estimated by the way of an approxi- 
mation method such as the finite element method. This method uses piece- 
wise polynomial functions defined on triangles or rectangles in 2D formulation 
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(tetrahedra, hexahedra in 3D). In this study we first split the domain {2 into 
triangular elements, gathered in a triangulation. In the finite element theory 
a triangulation Tp (or mesh) of the domain N is a set of triangles satisfying 
the following properties: 

P= K, 


KET, 


0, 
VK, K'€T, KOK’ =< ora vertex common to K and Kk’, 
or an edge common to K and K”. 


Then a finite-dimensional vector subspace Vp C V is introduced. A sim- 
ple example of such a subspace is provided by the so-called “Lagrange finite 
element” of degree 1. For an arbitrary triangle K in 7}, with vertices A;, Ao, 
and A3, an element T; of V, is defined by 


T (M) = T;,(A1)A1 + Tp (42) À2 + Tp (43)à3, (11.4) 


where 77,(A;) is the temperature value at À;, one of the three vertices of 
triangle K, while Ay, A2, and À3 are the barycentric coordinates of the point 
M in triangle K. 


Remark 11.1. Let K be an arbitrary triangle with vertices A,, A2, and 43. 
The barycentric coordinates of point M are three real numbers À1, A2, and 
A3 such that Ay + Ao + A3 = 1 and 


OM = \104, + AO Åv + AsO As. 


If x,y are the Cartesian coordinates of M, then the barycentric coordinates 
A1, À are a solution of the linear system 


E = «(A3) + [x(41) — #(A3)]A1 + [z(42) — z(A3)]A2, 


A E Cree ine ri 


where (x(4;),y(4;)) are the Cartesian coordinates of vertex A;. The unique- 
ness of these values is guaranteed if and only if A1, A2, and 43 are not on a 
straight line. 


The definition (11.4) of the approximate temperature T; leads to the new 
relation 


TM= TAS FRAN SR (As)]A1 + A= AA: 
We introduce then the subspace Ve C Va by 
VS = vy E VA, 1” Bose 0}. 





Let Tp be the element of V, whose components are all zero, except for Tp(A;), 
with point A; located on the boundary 0922p, whose values come from (11.2). 
The discrete variational formulation of problem (11.3) is then 
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Find T, € Tp + V? such that 


Vie. >» | eee T!)' K grad Tg iS | TAF à (11.6) 


KETh KETh 
11.4 Implementation 
Formulation (11.6) uses integral calculation on triangles of 7p. Before going 


further into the details, we examine these terms when K is an arbitrary tri- 
angle. One has to compute the value of 


| wee TI K grad Th dx and [n F dx. 





Matrix Computation 


The vector srad T; has to be calculated for each T; in V, and each K in Tp. 
We first write 


OT, OT, Ox OT, dy 























= lori = 1.2: 11. 
oe deo OG oe oe S CES 
Then, using (11.4) and (11.5), we get 
OL. nd ; OL Oy 
ar, T,(A1) — 7, (43), aa, — x(A1) — 2(As), ax. = y(Ai) — y(As), 
oT; ; i Ox Oy 
= = — = x(A2) — x(A — = y( A2) — y( A3). 
Do T;,(A2) — T;,(As), Drs x(A2) — (43), Drv y(A2) — y(43) 
(11.8) 
So a new formulation of (11.7) is 
OT} 





T, (A1) — 77 (43) x(A1) —x(A3) y(A1) — y(43) Ox 


t(Az)—#(A3) y(A2) — y(As) | | OT; 
Oy 


(11.9) 
T (42) — T, (43) 





The matrix determinant in (11.9) is 


AK = (x(A1) — x(43)) (y(A2) — y(43)) — (z (A2) — z(A3)) (y(A1) — y(As))- 


Since |Ax| is twice the area of triangle K, matrix (11.9) is invertible when 
K is not a flat triangle (i.e., the three vertices are not on a straight line). We 
introduce then an array [dl T;]x and a matrix Bg by 





[dl Tp] = | T; (Aa) 
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and 


and write 
/ (grad TORK grad Ty dx = [dl Ti lie [Ax][dl Th] x. 
K 
Matrix [Ax] is the element matrix, and is computed by 
1 t 
[Ax] = 9 CK AK BE Brk. 


Remark 11.2. The value of cg, the thermal conductivity coefficient, is different 
in the air and in the resin, and so depends on K. 


Right-Hand Side Computation 


We assume in the following that the heat source function F, associated with 
an electrical resistance located at point P,(x,,y,.) has the form 


Fi 1 
F(z, y) = = EXP [—d? (zx, y)|, with ca, y) = 2 R2 ((x 7 Bp) dE (y 7 u) ’ 
so using the previous notation we may write 
ne | F, dx = Fo and | T} F, dz = |dl T, l% [bx], 
Q K 


where the array [bx] is the element right-hand side, computed by means of 
the numerical integration formula 


TH) 


1 1 2 


The Linear System 
Gathering all these results, we rewrite problem (11.6) in the new form 


Find T, € Tp + A such that 


VT, EVP à [dl Ti (Axlldl Tilx = D [dl Ti [bx]. (41-10) 
KETh KETh 


In this formulation, the array [dl Ti] represents the temperature values 
of an arbitrary function 7} in Vp. It is an array whose three components are 
the temperature values at the vertices of K, an arbitrary triangle of 7p. When 
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we compute the summation in (11.10), element K replaces all triangles of 73, 
so all functions T; in V, are taken into account. We then rewrite (11.10) as 





l Find 7h € Tp + Ve such that 


VEVE [dl Ti] [A] [al Ta] = [di Th} (BL. ett) 


Let nv be the number of vertices in triangulation Jp; then [A] is a square 
matrix of R"’*”” and fb] is an array of R””. Note that 


[dl Tp] = [T (A1), T, (A2), a SD Arall 


and 


[dl Th] = [TAD Ih A Th(Ano)| 





are arrays whose nv components are the temperature values at the vertices of 
Tn. To end, we remark that (11.6) is a linear system with nv equations and 
nv unknowns: 


[A] [dl Ta] = [b]. (11.12) 


11.5 Boundary Conditions 


It is time now to take the boundary conditions into account. The condi- 
tion T} = 0 on Np, specified in the definition of the space VŸ, involves 
important modifications of the linear system (11.12). For the sake of sim- 
plicity, we shall assume in the following lines that the vertices located on 
Np have the largest numbers when the points of triangulation Jp are or- 
dered. More precisely, the numbers of these nup vertices are supposed to be 
nv — nup + 1,nv — nup + 2,...,nv. Any element of the finite-dimensional 
space V, is then an array of nv real components, and any element of the 
subspace Vp is an array whose nup last components are null. So the linear 
system (11.12) arising from (11.11) seems to have only (nv — nup) rows but 
nv unknowns! Fortunately, since the solution 7} of problem (11.11) belongs to 
the space Tp + VD , its nup last components are well known and determined by 
the data Tp associated with the Dirichlet boundary condition Th |59, = Tp. 
Finally, the linear system (11.12) has (nv—nvp) unknowns for the same num- 
ber of equations! Nevertheless, this treatment has heavy consequences for the 
computer formulation of (11.12). We write first 


46) * [mo] Le 
A$ A3 Th D Gil: 
The square matrix A; is of order (nv — nup), Ag has (nv — nup) rows 


and nvp columns, and 43 is a square matrix of order nup. When we take 
the condition T; |aq,= 0 into account, we see that the nvp last rows of the 
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linear system vanish. These rows are replaced by the nvp relations Th lap = 
Thp = Th, so the linear system is now 


A; Áv , In| |b 
0 I Th D = Tp 
where I is the identity matrix of order nup. For matrix storage reasons it is 


important to preserve the symmetry of the initial problem. A final modifica- 
tion is then necessary: the matrix À: is eliminated in order to obtain 


A, O X Th = b — AgT'p 
0 I Trp| Tp | 


which is the symmetric linear system solved by the computer. 





11.5.1 Modular Implementation 


When implementing the finite element method, different logical steps have to 
be taken into account: 


1. definition of the triangulation 7», 

. construction of the linear system (11.12), 
. introduction of the boundary conditions, 
. solution of the modified linear system, 

. visualization of the results. 


oe W N 


Any scientific package has to deal with all elements of this list: there exists 
a distinct procedure corresponding to each step encountered during the im- 
plementation. These procedures are called modules. Results (output) of the 
kth-step module are data (input) for (k + 1)th-step module. Several modules 
may exist for the same logical step, in which case they have to share similar 
formatted input and provide similar formatted output. 


11.5.2 Numerical Solution of the Problem 


In solving problem (11.11), the very first step is the mesh construction. Nu- 
merous packages are devoted to this work, and 2D meshes are easily created. 
See, for example the mesh displayed in Fig. 11.2(a) obtained with the INRIA 
code emc2. ! There are altogether 304 triangles and 173 vertices. Another 
mesh, displayed in Fig. 11.2(b), was computed by the MATLAB “toolbox” 
PDE-tool, with 1392 triangles and 732 vertices. 

The mesh description, as provided by the code emc2, is summarized in the 
following list 


1. Nbpt, Nbtri (two integers): number of vertices (points), number of trian- 
gles 


l http://www-rocql.inria.fr/gamma/cdrom/www/emc2/eng.htm. 
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Finite element (coarse) mesh Finite element (fine) mesh 

1 1 
0.8 0.8 
0.6 0.6 
0.4 0.4 
0.2 0.2 
0 0 
—0.2 —0.2 
-0.4 -0.4 
—0.6 —0.6 
—0.8 —0.8 

1 -0.5 0 0.5 1 1 -0.5 0 0.5 1 


Fig. 11.2. Mesh of the domain. (a) coarse; (b) fine. 


2. List of all vertices: for Ns=1,Nbpt 
e Ns, Coorpt{Ns,1], Coorpt[Ns,2] ,Refpt[Ns] (one integer, two reals, one 
integer): vertex number, coordinates, and boundary reference; 
3. List of all triangles: for Nt=1,Nbtri, 
e Nt, Numtri[Nt,1:3], Reftri[N] (five integers): triangle number, three 
vertex numbers, and medium reference (air or resin). 


Exercise 11.1. 1. Create a mesh (or read one of the data files provided with 
the procedures). Check contents of arrays Coorpt and Numtri. 
2. Compute element matrix and right-hand side for each triangle. 
. Write a procedure that assembles the linear system from element data. 
4. Modify the linear system in order to take the boundary conditions into 
account. ? 
5. Solve the resulting linear system. 
6. Visualize the results, plot the isotherm curves. 


CO 


Hint: Use the following algorithm to assemble A and b: 


for K=1:Nbtri 
(a) read data for triangle K 
XY = coordinates of triangle K vertices 
NUM = triangle K vertices numbers 
(b) Compute element matrix AK(3,3) and right-hand side bK(3) 
(c) Build global matrix A(Nbpt,Nbpt) 
for i=1:3 
for j=1:3 
A(Num(i) ,Num(j))=A(Num(i) ,Num(j))+AK (i, j) 


2? For this experiment Op is the union of the lines y = —1 and y = 1, Qy is is 
the union of the lines x = —1 and x = 1, and NF = @. 
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end 
end 
(d) Build global right-hand side b(Nbpt) 
for i=1:3 
b(Num(i))=b(Num(i))+bK(i) 
end 


end 


À solution of this exercise is proposed in Sect. 11.8 at page 248. À com- 
puted temperature field is displayed in Fig. 11.3(a), representing the varia- 
tions of temperature within the domain {2. It corresponds to the following 
data: Tp = 50° Celsius on the upper part of the oven (y = 1), Tp = 100° 
on the lower part (y = —1), and no heating sources (F = 0, f = 0). Another 
temperature field is displayed in Fig. 11.3(b): plotting the temperature vari- 
ations according to the previous boundary conditions but with four heating 
resistances (common value is 25 000). It is obvious in Fig. 11.3 that the tem- 
perature value in the bumper is far from 250°, which is supposed to be the 
ideal one in our study. To fix this problem we have to increase the resistance 
values. Yes, but by how much? In the previous computation we have used 
the resistance values as data and obtained the temperature field inside the 
oven as result; this is referred to as the direct formulation of the problem. But 
what we are interested in is the resistance values that produce the optimal 
temperature inside the bumper; this is called the inverse problem. We shall 
address the inverse problem in the following section. 








Temperature field without resistance Global temperature field 

















Fig. 11.3. Temperature fields. (a) without resistance; (b) with four resistances. 


11.6 Inverse Problem Formulation 


We emphasize now an important property of (11.1)-(11.2): the problem is 
linear. This means that if T’ is the unique solution of problem (11.1)-(11.2) 


11.7 Implementation of the Inverse Problem 245 


corresponding to data {F", Th, f’}, and T” is the unique solution correspond- 
ing to data { F”, 77, f"}, then aT”+ GT" is the unique solution corresponding 
to data {aF"+ BF", aT, + 6T5,af'+6f"} for any real numbers a and 8. In 
order to determine the values of the resistances that lead to a correct heating 
of the object, this propriety is of great interest. Assume that there are nwr 
heating resistances and that the boundary conditions are temperature value 
Tp imposed on 0{?n and heat flux f imposed on ð Ny. Then the corresponding 
temperature field T is written as 


nur 


T = To + > aTh, 
k=1 


where ax is the kt” resistance value and 7% represents the temperature field 
when resistance k is the unique resistance heating the oven. These coefficients 
ax are the unknowns of the inverse problem, and we are going to compute the 
values corresponding the desired temperature Topt by minimizing the quantity 


J(a) = J. 


Here a = (Q1,...,Qnwr)’ and S stands for the bumper. The quadratic 
functional J is a strictly convex function of the variable a, and its unique 
minimum is reached when VJ(a) = 0. For k = 1,2,...,nwr, the gradient kth 
component is 


e 2 J (Tm =o) = 5 O'R! n) Tk(x)dx, 


Oa 
k pE 








nwr 


Topt(£) — To(x) — N an AE 
k=1 








and the minimum is reached when 


nwr 


N ap J Ty (£) Ty (£)dz = J (Topt(£) — To(x)) Ty (x)dzx, 
k'—1 S S 
for k = 1,2,...,nwr. We introduce now the matrix À € R™™*"”" and the 


array be Rr by 
Åk k = J Ty, (x) Ty (x)dxz and by = l (Pate) = To(x)) Tite da. 
S S 
The optimal & = (ã1,..., nwr)" is the unique solution of the linear system 


AG =b. (11.13) 


11.7 Implementation of the Inverse Problem 





Exercise 11.2. 1. Compute and solve the linear system (11.13) arising from 
the optimization problem. 
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2. Compute and plot the temperature field corresponding to the optimal 
value. Comment on the results. 


A solution of this exercise is proposed in Sect. 11.8 at page 249. We have 
first to solve the nwr +1 direct problems in order to compute the temperature 
fields T6,T3,..., Thuwr. They are obtained by the use of nwr + 1 calls of the 
program written to solve the direct problem. Each computation corresponds 
to a distinct value of the data Tp, f, and F (note that the localization of 
the resistances in the oven is a geometrical datum of great importance). The 
corresponding temperature fields are then stored in nwr + 1 distinct arrays. 

Now we solve problem (11.1)-(11.2) without any heating term (F = 0), 
but with a temperature Tp ~ 0 and thermal flux f = 0 given on the boundary. 
The solution of this problem is denoted by Tọ, and displayed in Fig. 11.3(a). 
We can see that the boundary conditions are well respected: temperature 
value is T = 100 when y = —1 and T = 50 when y = 1. The heat conductivity 
coefficient is set to the value c = 1 within the air and c = 10 within the object 
(air is a good insulation medium). The vanishing thermal flux on the other 
parts of the boundary corresponds to isotherm lines parallel to the normal 
vector when x = —1 and x = 1. 








Temperature field for resistance n2 Optimized temperature field 




















-1 -1 





Fig. 11.4. Temperature fields. (a) Tə field; (b) optimized field. 


Then we solve nwr successive problems (11.1)-(11.2) (one problem by 
resistance). Each case consists in computing the temperature field when only 
one resistance is heating the oven. The boundary conditions are temperature 
Tp = 0 on Np and thermal flux f = 0 on Ny for all cases. The nv 
components of array Tk represent the temperature field related to the k*” 
resistance. Figure 11.4(a) displays the temperature field associated with a 
single heating resistance. Note the tiny values of the temperature. 

We notice again that the boundary conditions are well satisfied: the tem- 
perature vanishes when y = —1 and y = 1 (Dirichlet condition), and the 
Neumann condition (null flux condition) leads to isotherm lines perpendicu- 
lar to lines x = —1 and x = 1. 
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To solve the inverse problem, we have now to construct the linear system 
(11.13) and then compute the terms 


S 


from the temperature fields 7; and Tg; computed in the previous step. This 
computation is performed in the same way as before: we first write the integral 
term on the complete object as summation of integral terms on all triangles 
of the object: 





J. Hood. [ Tr (x) Ty: (x)dx. 


KCS 


Then any integral on K is evaluated using the expression for T(x) and 
Ty (x) in triangle K 


Tir) = Tk (As) + [Tk A1) — Te(As)]A1 + [Le(A2) — Te(As)]A2. 


In this formula À; is the ith barycentric coordinate of the point M(x,y) in K, 
and À; is one of the vertices of triangle K. So we may write 


J me Th (x)dx = [dl Tk K]" [Mg] Ter x]: 





This leads us to introduce the matrix [Mx], the so-called element mass ma- 
trix. The element mass matrix associated with the Lagrange finite triangular 
element of degree 1 is 
211 
Aa 
[Mx] = oe N 
E 1 2 | 


So in computing the linear system (11.13), the matrix coefficient Ak. 
is obtained by summation over all triangles laying inside the objects of the 
terms [dl Ty xl [Mx][dlTy:,x]. The right-hand side b is computed in the same 
way. An example of calculation, corresponding to the case of four heating 
resistances, is displayed in Fig. 11.5(a). We may see there the optimized tem- 
perature field obtained after computation of coefficients ax. Figure 11.5(b) 
displays the solution corresponding to six heating resistances. The value of 
the temperature within the rectangle |—0.5,0.5] x [—0.2,0.2] is very near to 
the target value (250°). 

It is very important to notice that the optimal value of coefficients ax 
depends on the number of resistances but also on the position of these resis- 
tances in relation to the object. An interesting development of this study is to 
try to optimize the layout of the resistances inside the oven. The practical aim 
of such an additional study should be the optimization of the thermal power 
dissipated by the resistances. In this particular case, we want to optimize the 
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Fig. 11.5. Optimized temperature fields. (a) four resistances; (b) six resistances. 


temperature value in the object and the thermal (or electric) energy used to 
warm the oven. This is modeled by adding a special term to the functional 


ne) = | (Ena — Tote) - So nto) dxt+ CS of. 
k=1 


k=1 


Remark 11.3. You may get very small values (even negative) for the coeff- 
cients ax. This means that the corresponding resistances are located too close 
to the object and then have to cool it instead of heating it. 


It may be seen in Fig. 11.6(a) that a device with six heating resistances 
produces a larger “well heated” area than with four resistances. Note also 
that this computing is performed with a rather coarse mesh (173 vertices 
and 304 triangles). The results are satisfying; they prove the efficiency of the 
finite element method to solve this problem, and provide a validation of all 
the procedures, and of the whole process. Nevertheless, in order to get more 
realistic and more accurate results, it is necessary to solve the problem on a 
“finer” mesh. We have proceeded to a second computation on a mesh with 
732 vertices and 1392 triangles (and still six resistances). The final result is 
plotted in Fig. 11.6(b), showing a true improvement, especially around the 
object and the resistances. This improvement, predicted by the finite element 
method (the smaller the element size, the better is the result) leads to an 
increase of the computational time. 





11.8 Solutions and Programs 


Solution of Exercise 11.1 


The file THER_oven_exi.m contains the procedure THER_oven_ex1, which re- 
alizes the numerical experiment by defining the physical parameters of the 
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Fig. 11.6. Optimized temperature fields. (a) coarse mesh; (b) fine mesh. 


problem (localization of the resistances, heat conductivity coefficients, bound- 
ary temperature values). It calls the procedure of the file THER_oven.m, which 
computes the corresponding temperature field. 

The file THER_matrix_dir.m contains a procedure that builds the linear 
system arising from the heat equation problem. We notice that the right-hand 
side vanishes outside those elements that contain a resistance. The procedure 
of the file THER_local.m builds the right-hand side for given resistances co- 
ordinates. The procedure contained in file THER_elim.m takes the boundary 
conditions into account. 


Solution of Exercise 11.2 


The file THER_oven_ex2.m contains the procedure THER_oven_ex2, which 
computes the resistances’ values corresponding to the optimal temperature 
field. It calls the procedure of the file THER_matrix_inv.m computing matrix 
and right-hand side of the optimization problem. 


Remark 11.4. We also provide an interactive version of the solution of this 
project, allowing one to realize numerous numerical experiments by changing 
data through a graphical user interface. To launch the interface, just run the 
script Main from the subdirectory interactive. 


11.8.1 Further Comments 


In this last section we address the important point of the structure of matrix A. 
The matrix is sparse because this property is related only to the approximation 
scheme and the differential operator, which is here similar to the Laplacian 
operator. 3 The shape of A (see Fig. 11.7(a)) is not as regular as the one 


3 The value of the heat conductivity coefficient c is not relevant for the matrix 
structure. 
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displayed in Chap. 7 (compare to Fig. 7.8). This difference results from the 
use of a finite element mesh with triangles, instead of a rectangular grid. 
Furthermore, the structure of A is strongly depending on the nodes ordering 
as can be seen by comparing the matrices obtained on the coarse mesh (Fig. 
11.7(a)) and on a finer mesh (Fig. 11.7(b)). The obvious difference is due to 
a reordering of the nodes for the coarse-mesh calculation. 


Matrix A (coarse mesh) Matrix A (fine mesh) 








K ns ts se 


500 Sag 


600!: 1.3 8 





























0 200° 400 600 


nz = 1007 nz = 4739 


Fig. 11.7. Associated matrix. (a) coarse mesh; (b) fine mesh. 
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Fluid Dynamics: Solving the Two-Dimensional 
Navier—Stokes Equations 


Project Summary 


Level of difficulty: 3 


Keywords: Navier-Stokes equations, Helmholtz equation, Poisson 
equation, projection method, ADI factorization, FFT 
Fourier transform 

Application fields: Incompressible flows, jet flow, Kelvin-Helmholtz insta- 
bility, vortex dipole 


12.1 Introduction 


The Navier-Stokes system of partial differential equations (PDEs) contains 
the main conservation laws that universally describe the evolution of a fluid 
(i.e., liquid or gaseous) flow. Even though these laws have been well estab- 
lished since the nineteenth century, the complete description of their intrinsic 
properties remains one of the challenging topics of modern physics and math- 
ematics. 

In this chapter, we consider some simplifying hypotheses that make the 
Navier-Stokes equations tractable with relatively simple numerical methods: 


e the density of the fluid is assumed constant (p = po), i.e., the fluid is 
incompressible; 

e the flow parameters depend on only two space variables (x and y), i.e., the 
flow is two-dimensional or 2D; 

e all the variables are considered as periodic functions of both x and y, i.e., 
we impose periodic boundary conditions. 


This model allows the study of simple, but fascinating, phenomena that turn 
out to give us an understanding of more complicated flows too. In this project, 
we shall numerically simulate: 
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e the Kelvin-Helmholtz instability of a mixing layer, and 
e the evolution of a particular vortex structure, the vortex dipole. 


From a numerical point of view, this computational project introduces the 
following algorithms or numerical schemes of more general interest: 


e the space discretization using a staggered grid and 2D finite difference 


schemes; 

e the combined Adams-Bashforth and Crank-Nicolson schemes for the time 
integration; 

e an alternating direction implicit, or ADI, method for solving the Helmholtz 
equation; 


e asolver for the periodic Poisson equation based on fast Fourier transforms, 
or FFTs; 
e the Thomas algorithm for solving a tridiagonal linear system. 


12.2 The Incompressible Navier—Stokes Equations 


The 2D flow-field of an incompressible fluid is completely described by the 
velocity vector q = (u(x, y), v(x, y)) € R? and the pressure p(x, y) € R. These 
functions are a solution of the following conservation laws (see, for instance, 
Hirsch, 1988): 


e mass conservation: 


div(q) = 0, (12.1) 
or, written using the explicit form of the divergence! operator, 
Ou ov 
ae 12.2 
Ox ss Oy l ) 
e the momentum conservation equations in the compact form? 
oq 1 
— + di = — — À 12.3 
a + div(a ® q) = -9p + Aa, (12.3) 


or, in explicit form, 
du, Out | due _ dp, 1 (Pu Pu 
Ot Ox Oy Ox Re\0x? dy? )’ 
ðv ðw Ov? Op 1 {Ov Ov 
tete | 
Ot Ox Oy Oy Re 


(12.4) 


Ja? T ay? 


1 We recall the definitions of the differential operators divergence, gradient, and 
Laplacian for a 2D field: if v = (vz, vy) : R? + R° and y: R? 4 R, then 


du, ð dp a | ap ə 
= 2454, Gp=(,), Ap=div(ÿe) = + 
Ox Oy 


Ox’ Oy = x? y’ 
and Av = (Avg, Avy). 
? We denote by ® the tensor product. 


div(v) 
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The previous equations are written in the dimensionless form, using the 
following scaled variables: 


(se e224, gS. 22. $42, Os 
Vo Vo L/Vo poVe ree) 


where the superscript (*) denotes variables measured in physical units. The 
constants L, Vo are, respectively, the reference length and velocity that charac- 
terize the simulated flow. The dimensionless number Re is called the Reynolds 
number and quantifies the relative importance of inertial (or convective) terms 
and viscous (or diffusion)? terms in the flow: 





Re = os (12.6) 
V 
where v is the kinematic viscosity of the flow. 
To summarize, the Navier-Stokes system of PDEs that will be numerically 
solved in this project is defined by (12.2) and (12.4); the initial condition (at 
t = 0) and the boundary conditions will be discussed in the following sections. 


12.3 Numerical Algorithm 


We start by presenting the fractional-step method (Kim and Moin, 1985; Or- 
landi, 1999; Ferziger and Perié, 2002) as a general algorithm to solve the 
Navier-Stokes equations. This algorithm belongs to the class of so-called pro- 
jection methods and has become rather popular in computational fluid dy- 
namics. An extensive presentation of this method in a more general framework 
can be found in the recent book by Orlandi (1999). 

We use a fractional-step method that consists of two steps: 


1. The predictor step: we solve the momentum equations (12.3) written in 
the compact form 


Og _ O 9 
St — —Gp + H + Z-A; for q= (u,v) € R’, (12.7) 


where H is the vector containing the convective terms 


ðu? uwv Ouv Ov’ 
nie, ee 2: 
a Ga tac) ee) 


and Gp the pressure gradient vector. Time discretization of (12.7) com- 
bines the explicit Adams—Bashforth scheme (for the convective terms H) 
with the semi-implicit Crank—Nicolson scheme (for the diffusion terms 


3 The model scalar equations describing the convection and diffusion phenomena 
are discussed in Chap. 1. 
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Aq). If ôt denotes the calculation time step (supposed constant), the time 
advancement of the solution from t, = nôt to t,11 = (n + 1)ôt follows the 
scheme 


© = Gy" + 2H 2H" — A —— |). 12.9 
T Gp” + 5H 5H +z ( 3 ) (12.9) 
Å— Aa 


<——  — 
Adams-Bashforth Crank-Nicolson 


In the previous equation, the pressure is treated as an explicit term (com- 
puted at time tn). As a consequence, the velocity vector q* does not satisfy 
the mass conservation equation (12.1). 

The corrector (or projection) step: the velocity q* is corrected such that 
the velocity field g"*! is divergence-free (or solenoidal). We use the fol- 
lowing correction equation:* 











gt tt — = —6tGo. (12.10) 


The variable ¢ (related to the pressure, but without any physical meaning) 
is calculated by taking the divergence of (12.10); using that div(q” ++) = 0 
and div(G) = Ad, we obtain a Poisson equation 


Age = div(q"). (12.11) 


To close the algorithm, the pressure for the next time step is updated 
ee: 
using 


bt 
— A, 12.12 
2Re g ( ) 


To summarize, the numerical algorithm consists of the following steps, 


pt! =p” +ọ-— 


rearranged below in the form that will be used in computer programs: 


Algorithm 12.1. To solve the Navier-Stokes equations (12.2)-(12.4). Given 
the field (u”, v”, p”), compute: 


4 


Or 





The idea behind this equation comes from the observation that q* and qg"*! 





have the same curl. Indeed, the pressure in the Navier-Stokes equation can be 
eliminated by taking the curl of the momentum equations. We recall that for the 
vector field v = (vz, vy), curl(v) = Ovy/Ox — Ov, /Oy measures the amount of 
rotation or the angular momentum of the field. 

This equation is obtained as follows: we write (12.9) with an implicit discretization 
of the pressure term 


n+1 n 3 1 


n+1 n 
Gq’ TT _ ept y Baym lon, t4A(2 +4 
= —Gp Ses A + re ( 5 ) 


and subtract this equation from (12.9). After replacing q* from (12.10), we get 
(12.12), up to an additive constant. Note that this constant is discarded by taking 
the gradient of the pressure in the momentum equations. 
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(A) the explicit terms H”: 


Ou? wv 

SEES LT 
Ouv Ov 

Re DEN RS Le 12.14 


(B) the nonsolenoidal field q* = (u*,v*) by solving the Helmholtz equations 


(1 spc 4) wt =u +t | a F Ma + a), 








2Re Ox 2R 
(12.15) 
Ôt Op 1 
I- —A)v* =v" +ôt n _ tyr- ay, | 
( Re Je aes -Z tin 5 ity 102 on 
(12.16) 
(C) the variable o by solving the Poisson equation 
1 (/Ou* Ov 
Ao = — 5 LT 
ý ôt (> a“ Oy | ) 
(D) the solenoidal field q’*+ = (u"*+,u"*"), with 
0 
n+l = u* — ôt — 12.18 
u ye Aen ( ) 
gt = y* — bt on (12.19) 
(E) the new pressure: 
ot 
nt) =p" + 6- — Ao. 12.20 
p promoa 4 (12.20) 


Steps (A)-(E) are repeated for each time step. 


12.4 Computational Domain, Staggered Grids, and 
Boundary Conditions 


Numerically solving the Navier-Stokes equations is considerably simplified 
by considering a rectangular domain Ly x Ly (see Fig. 12.1) with periodic 
boundary conditions everywhere. The periodicity of the velocity q(x,y) and 
pressure p(x, y) fields is mathematically expressed as 


q(0,y) =q(Le,y), p(0,y) = p(Lr,y), Vy € [0, Ly], (12.21) 
q(x, 0) = q(x, Ly), p(x, 0) = p(x, Ly), Va € 0, Lz]. (12.22) 


The points at which the solution will be computed are distributed in the 
domain following a rectangular and uniform 2D grid. Since not all the vari- 
ables share the same grid in our approach, we first define a primary grid (see 
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Fig. 12.1) generated by taking n, computational points along x and, respec- 
tively, ny points along y: 

















Le | 
tai) == Los, 0% an bassai (12.23) 
Ne — 
NDS M | 
== Loy. Oye ee, dee ee (12.24) 
Ny —1 
L 
periodicity = 
y(j+1) 
periodicity periodicity YU) 
> r 
y(i) 
0 = 
periodicity |L x 
0 X 





X 

















Fig. 12.1. Computational domain, staggered grids, and boundary conditions. 


À secondary grid is defined by the centers of the primary grid cells: 
Sys -1/20% GT ag 0. (12.25) 
aD = Oa 20g, J = dsm (12.26) 
where we have used the shorthand notation ngm = Nz—1, Nym = Ny — 1. Inside 
a computational cell defined as the rectangle |xe(i), zeli + 1)] x lyelj), Yel j + 


1)}, the unknown variables u,v, p will be computed as approximations of the 
solution at different space locations: 





e u(i,j) & u(xe(i), Yn(j)) (west face of the cell), 
e v(i, j) © V(tm(i), ye(7)) (south face of the cell), 
e p(t,7) © p(tm(t),Y¥m(J)) (center of the cell). 


This staggered arrangement of the variables has the advantage of a strong 
coupling between pressure and velocity. It also helps (see the references at the 
end of the chapter) to avoid some problems of stability and convergence ex- 
perienced with collocated arrangements (where all the variables are computed 
at the same grid points). 








12.5 Finite Difference Discretization 


In this section, Algorithm 12.1 will be written in a discrete form that will be 
used in the computer programs of this project. We start by noticing that the 
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periodic boundary conditions take the following discrete form: 


u(1, j) = unes f), Vi = Le My; 
ie do Re ee) 


and similar relations for v and p. As a consequence, the unknowns of the 
problem are only the nam X Nym values 


ut). DIG), pi). do els ss Tab Mays 


À very useful programming trick (see also Chap. 1) in implementing dis- 
crete periodic boundary conditions (12.27) consists in defining the supplemen- 
tary arrays 


t =i +1, i= 1,..., (nem — 1), Eu SS eh Nig: FD act ae el 
ip(Nem) = 1, jP(Nym) = 1, 
(12.28) 
ee el a eee rer o eh en 
im) = Tiz; TUL = ais 
(12.29) 
and using the vectorial capabilities of MATLAB to write the finite difference 
discretization of the differential operators in a very compact form. For ex- 
ample, to compute (Oy/Ox) (i,j) for a fixed j and all i = 1,..., Ngm, the 
second-order centered finite difference scheme is explicitly written as 


a a à 





=D, (ie), 


with a particular treatment of indices 2 = 1 and à = Ngm: 


ie (2,9) — YNsm: J) Oy ied (1,9) — Y(nzm — 1,7) 


Using the vectors im and ip from (12.28) and (12.29) we can compress the 
previous relations into a single one: 


eae ae ae ee ee 12.30 
zz ÈI) xr 1 l 1 EL ( ) 


Remark 12.1. As a general programming rule, in a finite difference scheme 
with periodic boundary conditions, we shall replace indices (à + 1) by ip(i) 
and (i — 1) by im(z) (and similarly for j indices). 


We are now equipped to present in detail the full discrete form of each 
step of Algorithm 12.1. 
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(A) Computation of Explicit Terms 


The two components H? (12.13) and H? (12.14) of the explicit term H” 
are computed at the same points of the grid as the corresponding velocities. 
To follow the logic of the discretization below, the reader is invited to add 
to Fig. 12.1 the adjacent cells (i, j + 1), (i + 1,3). Using the centered finite 
difference scheme (12.30) we easily obtain: 

e for the computation of the velocity u (located at (xeli), Ym(j))): 





for i=1,...,Rams J =1,---,Nyms 
De g (Eta SCENE DE < ui.) + ut). enteno], 
D» 7e |( Ae ee) uli, J) — 5 (eee) ) 
y y 
(aoe) u(i, j) + = jm(j)) (‘ v(t, j aa eh om) | 
Hali j) = CET (12.31) 


e and similarly for the velocity v (located at (£m (i), Ye(j))): 
Orts le ne <p he ayes 


Ov ij s g (2er) (enere m], 


(0) 2 = (OLD + seen) (ee a ed) 


_ (wads un) (wi vin.) | 
Mid = -EEEN - EG (12.32) 


(B) Computation of the Nonsolenoidal Velocity Field 


We first notice that (12.15) and (12.16) can be written in the compact form 
of a Helmholtz equation: 


1 
— Agq"|, (12.33) 


ot n 3 n 1 n—1 


2 Re 
ee 
Helmholtz operator RHS” 


where we have introduced the notation dq* = q* — q”. When periodic bound- 
ary conditions are imposed, this equation is usually solved using fast Fourier 
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transforms (or FFT). We present in the following a different method, the al- 
ternating direction implicit (or ADI) method, which is easier to implement and 
has the advantage that it easily takes into account other types of boundary 
conditions. The Helmholtz operator is approximated since the terms O(6t?) 
are neglected: 


ôt ôt o? ôt ©? 
= * 2 =  _— = a te 12.34 
(1 2Re A) a (1 2Re =) ( 2 Re 5) a ESA 


This second-order accurate factorization’ is use to solve (12.33) in two steps: 





OG, 30 Nes a oe 
(1 Tome Z) ôq* =RHS” (+ periodicity along <x), (12.35) 





2 
(1 — ae si) ôq* = ôq* (+ periodicity along y). (12.36) 
It is important to note that we also impose periodic boundary conditions for 
the field dq*, which is physically meaningless. This choice, which seems to 
be natural for our periodic problem, becomes more difficult for other types of 
boundary conditions (Dirichlet type, for example) and needs further analytical 
development. 

For the discretization of second derivatives in (12.35) and (12.36) we use 
a second-order centered finite difference scheme, written here in the general 
form (see also Chap. 1) 








82 > 1, Noa 9 j š EN 1, ; 
Ob i ja PEHD WG 5) + WCT = 
Oy? ad by? i 


Finally, the algorithm used in the programs of this chapter is the following: 


Algorithm 12.2. (computes u* using an ADI method): 


e First step of ADI: for all j =1,...,Nym solve the linear system 


—Bx(du*)(@ — 1,7) + (1 + 28r) (Su*) (4,9) — Br (ôu*)(i + 1,7) = RHS% G, j), 
(12.38) 


ôt l>. In the previous relation, we take 


where i = 1,..., Ngm and By = 5R 5%: 
into account the periodicity by imposing 





(ôu*)(0, j) = (Ou*)(Mams j), (Ou*)(nxm + 1,7) = (ôu*)(1, 5). 


6 This is the subject of an exercise of this chapter. 
T The methods based on this idea are also known as approximate factorization or 
splitting methods. 
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More precisely, we have to solve nym linear systems with the following 
matrix of size Ngm X Ngm: 


T20 -=y 0- :+ 0 0 — By 
=D; TELO = Be. "0 0 0 
M= . , as | (12.39) 
0 0 0 ..—Gr 1+26% = 
— By 0 0 .. O —fÿ, 1426, 
This particular matrix pattern will be referred to in the following as a 
tridiagonal periodic matrix. 
An efficient method to solve such systems will be derived later, based on 
the well-known Thomas algorithm (Algorithm 12.5). 
At the end of this step we obtain (du*)(i, 7). 











e Second step of ADI: for alli = 1,..., Nym solve the linear system 
—PBy(du")(4,7 — 1) + (1 + 26,)(du") (4, j) — By(du") (2, j +1) = (6u*) ü, j), 
(12.40) 
where 7 = 1; ...,Nym and By = ot TA The periodicity requires that 


Qu”) 0) = —(u* ) (i, Nam), (OW i Nym t1) = (0u) 1. 


We obtain this time Nem linear systems with tridiagonal and periodic ma- 
trices Of SIZE Miom X Tiy: 


Lee. S0 O ay 0 0 =p} 
spy Ase Op S pes O 0 0 
M,:= : we . (12.41) 
0 0 ssh I 0, 
=p; 0 O xe OF 20). LE 


At the end of this step we get (ôu*)(i,j) and immediately 
u*(i, j) = u(i, j) + (6u")(G, j). 


The computation procedure is similar for the other component of the ve- 
locity. Considering that it could be helpful to correctly program the algorithm, 
we present here the details of the computation: 





Algorithm 12.3. (computes v* using an ADI method): 


e First step of ADI: for all j =1,...,Nym solve the linear system 


M, (6v*)(i, j) = RHS? (i, j), (12.42) 


where i =1,...,Nxm and the matrix M, is given by (12.39). 
At the end of this step we obtain (dv*) (2,7). 
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e Second step of ADI: for alli = 1,..., Nym, solve the linear system 


M, Go) (i, j) = Oli j), (12.43) 


where j =1,...,Nym and the matrix My is given by (12.41). 
We obtain (ôv*)(i,j) and immediately 


v* (i, j) = u(i, j) + (dv")(@, j). 


(C) Solving the Poisson Equation 


The Poisson equation (12.11) is discretized as 


_ p d\,.. a 
where? = lysis) Tem; 9 = hasy and 








1 1 (Ou* Ov” 
QU D = Zaina id) = = (FE + F) GA 


To solve this equation, we first use the periodicity along the x direction and 
expand the variable © in a discrete Fourier series: 


lam 


O47) = D here VOD, Vie, nom, (12.45) 
[=1 





where i = \/—1 is the imaginary unit. The advantage of using a Fourier series 
expansion is to diagonalize the Laplace operator and thus reduce the initial 
2D problem (12.44) to a 1D problem. Indeed, considering an approximation 
of 07¢/0x? by second-order centered differences, we obtain 


Poe, yn Hit LJ) — 26h) + Oi= 1.5) 














Ox? Ax? 
_ = : y oje en OUD (ere (1) _9 4 e tram =D) 
x 
l=1 
Bn A oip (i—1)(1—1) 2 2T a = 
ki 





Using a similar Fourier series expansion for the right-hand-side function 
Q (which is also periodic), 





fog Ayon j (i— = 
Q(t, 5) = D QG )e ren CD, 


l=1 
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we find that solving the initial problem (12.44) is equivalent to solving Ngm 
1D equations : 

Oy? 
where l = 1,..., Ngm is the wave number of the Fourier expansion. 

At this point of our numerical algorithm, equations (12.47) can be solved 
using either a similar Fourier expansion along y or a finite difference scheme 
to discretize the second derivative. Since the first method can be applied only 
for periodic boundary conditions, we choose the second one, which can be 
easily adapted to more general cases required by further development of this 
project (a wall boundary condition, for example). With this choice, (12.47) 
becomes, for j = 1,..., Nym, 


Pl) + kipli) = Qil), (12.47) 





zâl = (5 + hi) pili) + zâl +1) = Qi(j). (12.48) 


This equation must be supplemented with discrete boundary conditions for 
j = 1 and j = nym. In our case, we naturally use the periodicity of the 


function D. 

It is important to note that special treatment is required for the wave 
number l = 1 for which kı = 0. For this value, it is easy to see that the matrix 
of the system (12.48) is singular and our formulation is not well-posed! Indeed, 
the solution of the Poisson equation with periodic boundary conditions is de- 
termined up to an additive constant. This constant is exactly the first term 
(or the average value) of the discrete Fourier expansion (12.45). Since the ab- 
solute value of the pressure is of no significance for an incompressible flow (we 
saw that only the gradient of the pressure appears in the equations), we shall 
not worry about this constant, which will be freely fixed! Nevertheless, a rea- 
sonable choice would be to impose ¢o(j) = 0, for j = 1,...,mym, which yields 








zero average solutions ¢;. The final algorithm to solve the Poisson equation is 
presented in great detail in the following. 


Algorithm 12.4. To compute the pressure correction @ at points of coordi- 
nates (mli), Un (2): 
e ‘Compute the array Q(i, j), fori = 1,..., Ngm and j = 1,..., Nym, OY 


ati) = 2 (D ED | PGO VEN) (1240) 


Ot 
e Apply a fast Fourier transform FFT to each column of the array Q: 


PS 


Q(U, j) = FFT(Q(i,j)), L=1,...,nam, J=l1,..-; Nym: (12.50) 


The reader is of course aware that the values Q are complex! 
e Forl=1 impose ġı(j) = 0, forj = 1,..., Nym. 
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PS 


e For each l = 2,..., Nym, solve the linear system M = Q(l,j)’ of a 
tridiagonal matrix of size (Nym X Nym): 


—2 + ôy? ki 1 0..0 0 1 
1 1 —2 + ôy? kı 1..0 0 
Me ‘ Sue tel os i ‘ 5 
Y 0 0.. 1—2 + dy? ky 1 
1 0 0..0 1 —2 + ôy? ky 
(12:51) 
where 





kı = 5 cos (== qe n) = | | 


e Build the array (I, j), whose rows are the already computed vectors PL. 
e Apply an inverse Fourier transform (IFFT) to obtain the final solution 


oli, j) =IFFT(®(I,j)),  i=l,...;noms jJ=1,..., nym. (12.52) 
(D) Computation of the Solenoidal Field 


After solving the Poisson equation for the pressure correction, it is easy to 
correct the velocity field: 


e reS hesa Tdi 

u ti (4,9) = u* (i, j) oe, (12:53) 
ocre ds 

VTT (Gi, j) = v*(i, j) - 5¢ D LEO). (12.54) 


(E) Computation of the Pressure Field 


Using (12.20), the new pressure field is computed as 


e fort=1,..2;%gm, FH lye: tym 
5 ne a + AN S 
ptt (i, j) = p” (i, 5) Ee pli, j) oa O(ip(i), j) a T d(im(t), j) 
p ED 266.) + O65] apg 


Finally, the pressure gradient is updated for the next time step: 
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é Jort =e J= 1,..., Nym, 
WE a OGG) =p im) 12 
a à oa 
efor? 1. cies ees J= 1;...; nym; 
OEE a «Py ae A I 12:57 
= a (12.57) 
y y 


Calculation of the Time Step 


The last point to discuss for our numerical algorithm is how to compute the 
value of the time step dt. Since we use a semi-implicit scheme, the time step 
value will be bounded through an inequality called the CFI condition. This 
condition comes from a stability analysis of the scheme, which is far from 
a trivial matter when one is dealing with Navier-Stokes equations.? For the 
applications considered in this project, a fair CFL condition would be 


fl 
_ (12.58) 


ma = + | 
X =R 
Ox 


where cfl < 1 is a constant that controls the time step value. In practice, we 
shall use, when possible, a constant time step, computed from the condition 
(12.58) applied to the initial flow field. 








La 
Oy 


12.6 Flow Visualization 


An important point for numerically solving the Navier-Stokes equations is the 
postprocessing of the obtained data. Various interesting physical information 
can be extracted from a numerical field. For the unsteady flows considered in 
this project, we use visualization techniques offering an intuitive picture (even 
for a nonspecialist user) of the flow evolution. 

A simple way to visualize the simulated flow is to calculate the vorticity 
vector field w by taking the curl of the velocity. As we shall see, this is an 
effective visualization mean for flows dominated by large vortices (the reader 
can find nice illustrations of vortex flows in the remarkable Album of Fluid 
Motion by Van Dyke (1982)). For 2D flows, the vorticity vector has a single 
nonzero component, perpendicular to the flow evolution plane: 


8 Courant—Friedrichs—Lewy 

° Exact CFL conditions are derived for scalar convection and wave equations in 
Chap. 1. The CFL condition is also discussed for the Navier-Stokes equations 
with zero viscosity (i.e., the Euler equations) in Chap. 10. 
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Ov Ou 
=. ee 12.59 
í Ox Oy ( ) 
The discrete values of w are computed at points (xeli), ye(J)) by 


Ôx Oy 


The isocontours of vorticity (i.e., the lines of points at which the variable takes 
the same given value)!° allow one to identify vortical structures in the flow. 
A second visualization method consists in following the evolution of a 
passive tracer (or scalar) in the flow. This numerical technique is equivalent 
to experimental visualizations using smoke (for gases) or dye (for liquids). 
As suggested by its name, the passive scalar does not affect the flow field 
evolution; it is just transported by the velocity field, following a convection- 


diffusion equation 
ðx xu xv 1 


ðt Ox | Oy Pe” 


where the dimensionless number Pe (Peclet number) quantifies the diffusion 
properties of the passive tracer x. The values x(i, j) are computed at the cell 
centers (£m (i), Ym(j)) following the same numerical scheme as for momentum 
equations; this calculation is done at the end of each time step, allowing one 
to use the velocity values of the updated (solenoidal) field. 


(12.61) 


12.7 Initial Condition 


At this point, we are able to advance the numerical solution in time, but 
we still have to make precise the starting point of the computation, or the 
initial condition. We shall see that the initial field will be constructed so as 
to trigger the unsteady flow that we wish to simulate. In principle, the initial 
condition must be compatible with the Navier-Stokes equations; in practice, 
we prescribe only the initial velocity field and set the pressure to zero values 
everywhere. The correct pressure field will be established by the calculation 
after the first time step. 

In this project we shall simulate two classes of relatively simple flows that 
illustrate basic mechanisms found in more general and complex real flows. 





Dynamics of a 2D Jet: The Kelvin—Helmholtz Instability 


The Kelvin-Helmholtz instability generally occurs in flows where shear is 
present. The basic example for this instability is the flow of two parallel 


10 MATLAB built-in functions contour or pcolor draw isocontours for a given 2D 
solution field. 
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streams of different velocities and, eventually, densities. This flow can be ob- 
tained in a simple experiment: put in a long rectangular transparent box two 
immiscible liquids with large density difference and start to slowly incline the 
box. The denser liquid will start to flow in the lower part of the box, pushing 
the lightest liquid into the upper part. Very nice patterns, called Kelvin cat 
eyes, form at the interface between the two liquids (see Fig. 12.2 for a sketch 
and Fig. 12.5 for a numerical simulation). 








DE 

















Fig. 12.2. Evolution of the Kelvin-Helmholtz instability in a 2D jet: perturbation 
of the shear layer forming the contour of the jet (left) and roll-up of Kelvin (cat 
eyes) vortices (right). 





This phenomenon also occurs in jet flows that are generated by the injec- 
tion of fluid into a quiescent environment. The instability develops in the shear 
layer between the injected fluid and the fluid at rest. Our numerical simulation 
will start from an initial condition setting the velocity profile corresponding 
to the shear layers forming the contour of the jet: 








v(x, y)=0, u(x,y) = u(y) + u2(x)), (12.62) 
where u is the mean velocity profile 
Uo 1 L,/2— y| 
u1(y) = > (1 + tanh GP (1 — = I (12.63) 
and uz the perturbation that triggers the Kelvin-Helmholtz instability 


u(x) = A, sin (27 =) | (12.64) 


Note that both velocity profiles respect the periodicity condition at the bound- 
aries. The parameters Up, Pj, Rj, Ax, Ax will be specified later for numerical 
applications. 
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Evolution of a Vortex Dipole 





Vortices generated by the Kelvin—Helmholtz instability all rotate in the same 
sense, i.e., they have vorticities of the same sign. A vortex dipole is a pair 
of vortices of opposite signs. This configuration is encountered in many areas 
of practical interest (meteorological and coastal flows, trailing vortices from 
aircraft, 2D turbulence, swirled injection in stratified charge engines). We con- 
sider here symmetric dipoles for which the two vortices have the same vorticity 
magnitude; this is a stable structure that propagates along its axis of sym- 
metry with a quasiconstant translation velocity generated by a self-induction 
mechanism. The reader interested in studying vortex motion is referred to 
Batchelor (1988) and Saffman (1992). 
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Fig. 12.3. Velocity field of a single vortex (left) and of a vortex dipole (right). 


We shall numerically construct a vortex dipole by superimposing the ve- 
locity fields of two individual vortices (see Fig. 12.3). Each vortex, defined by 
its center (£v, Yv), size 1,, and intensity Vo, is analytically described by the 
following stream-function: 


w(x, y) = vo exp (aa i) | (12.65) 


i 





The stream-function is used to derive the velocity components as 


oyp er (Y a Us) w(x, y), 


oy f 
Y TT. (12.66) 
CE eg 2a Vu) 


The dipole is now assembled by taking two vortices of the same size l, but 
opposite intensities +y> and placing them symmetrically about a chosen line, 
which will be the propagation direction. For example, a dipole propagating to 
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the right along the x axis will be defined by (see Fig. 12.3) 
vortex 1: +0, ly, Lo, Yy = Ly +4, 

vortex 2 : —Wo,ly,2v, Yo = Ly — 4, 

where a is the distance separating the vortex centers. 

It is important to note that this is not a rigorous method to construct a vor- 
tex dipole, since the initial condition is not compatible with the Navier-Stokes 
equations.!! However, the velocity and pressure fields will be automatically 
adjusted to satisfy the equations after the first time step of the numerical 
simulation. 


12.8 Step-by-Step Implementation 


The survivors of previous lengthy theoretical developments may now start to 
implement the numerical algorithm to simulate some physical flows. Since this 
is a delicate process, we shall proceed step by step to construct our Navier- 
Stokes code. We adopt the programming strategy of building specialized pro- 
gram modules that will be first validated on simpler problems for which an 
exact solution is known. We start with some preliminary questions. 


12.8.1 Solving a Linear System with Tridiagonal, Periodic Matrix 


Since there exist different MATLAB built-in functions to solve linear systems, 
this part is not compulsory for the following numerical developments. Never- 
theless, we consider that the reader should be aware of the structure of the 
involved systems and efficient algorithms to solve them.!? Moreover, the al- 
gorithms in this section can be used in applications using other (less-friendly) 
programming languages. 

We shall use the particular pattern (tridiagonal, periodic) of matrices 
(12.39), (12.41), (12.51) to build an efficient numerical algorithm to solve the 
corresponding linear systems. We start by presenting the well-known Thomas 
algorithm for solving tridiagonal systems. 


Algorithm 12.5. Thomas algorithm for tridiagonal systems.'? The tridiago- 
nal system 


11 The reader can test more rigorous analytical models for the vortex dipole as, for 
example, the Lamb—Chaplygin dipole, which corresponds to a steady solution of 
the 2D Euler equations; see Batchelor (1988); Saffman (1992). 

12 Tt is always interesting to know what happens behind the magical MATLAB com- 
mand « = A\b that solves the system Ax = b. 

13 We can easily show that this algorithm is a particular form of the Gauss elimina- 
tion method. 
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bı C1 QO. ; 0 0 Xi fi 
ag bo C9 0 ; 0 0 Xo fo 
0 0 00 An—1 Dri Cn—1 Xn-1 Jasi 
0000 . an bn o fn 


is solved by introducing the following recurrence relation: 
Ck 
Xr =k ge Xk+, k=1,...,(n— 1), 
i (12.67) 
Inserting these relations in the initial form of the system, we can calculate the 
coefficients yk and Br: 





Bi = bi 
Be = br — EL an, k—2,...,n, 
Prai 
„hh 
By bi’ 
A fr — OVAL ka à 
bk 


After computing the coefficients yk and Bg, the unknowns Xg are immediately 
obtained from (12.67) by a backward substitution starting from the known 


value Xn = Yn- 


We now note that a periodic tridiagonal matrix has supplementary nonzero 
coefficients in the upper-right and lower-left corners. The idea of the following 
algorithm is to eliminate these intruders and to work with tridiagonal systems. 


Algorithm 12.6. Thomas algorithm for tridiagonal, periodic systems. The 


system 
by C1 0 | 0 0 lai Xı fi 
a2 b2 C2 | 0 0 0 Xo fe 
| 10 | 
M a ie < fal. 
0 0 0 - Gn-1 bai Gsi Xn-1 Jaca 
ca 0 0 O an |bn Xn fn 


is reexpressed as\4 


14 This decomposition is similar to the Shermann—Morrison formula. 
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bi C1 0 f 0 0 0 lv Xı fi 
ag bo C9 . 0 0 0 |0 Xo fo 
, | |0 ; 
PE wy 4 . [0 ; B ; 
0: Re iea | Saal Saa a 
0 0 0 . 0 an b Wn Xn Fa 
—1 0 0 . 0 0 —1 |l AS 0 
where 
UV, = €], 
Un = Cn, x o 
= ie and A SAI A 
b” = bn — Cn, 
An equivalent form of (12.68) is 
bi C1 0.. 00 X: U1 fi 
ag bg C0. 0 0 Xo 4+ 0 VE fo 
00 00.a, b Xn de Jn 
er 
M* 
together with 
X* = Xi + Xp. 
We now seek a solution of the form 
X, = XP — xP).x*, k=1,...,n, (12.69) 


with the vectors X“) and X) solutions of two tridiagonal systems of size n: 


oe = (fi Toren Tnet Ta: 


12.70 
MEXA = 100 00a) ) 


Finally, the supplementary unknown is calculated as 


XM + xp 
FA EX 


To summarize, the algorithm consists of the following steps: 


e solve the two tridiagonal systems (12.70) using the Thomas algorithm 12.5; 
note that the program for this step can be optimized since both systems 
share the same matrix M*; 

e compute X* from (12.71); 

e compute the final solution using (12.69). 
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Exercise 12.1. Write a MATLAB function 
function fi=NSE_trid_per_c2D(aa,ab,ac,fi) 


that solves simultaneously m systems with tridiagonal, periodic matrices. Al- 
gorithm 12.6 is used to solve each system j (for 1 < j < m) defined as follows: 


for i=1,...,n 
aa(j,i)*xX(j,i-1)+ab(j,i)*xX(j,i)+ac(j,i)*xX(j,i+1)=fi(j,i), 
with periodicity condition X(j,1)=X(j,n). 


Hint: use vectorial programming to apply the relations of the algorithm si- 
multaneously to all m systems; for example, the program lines computing the 
coefficients b], b% of the matrix M* from (12.68) are written as 


ab(:,1)=ab(:,1)-aa(:,1); 
ab(:,n)=ab(:,n)-ac(:,n); 


which implies that the computation is done for all j (row) indices. 
Test this function using as model the MATLAB script NSE_test_trid.m.'° 


12.8.2 Solving the Unsteady Heat Equation 


Study of the 2D unsteady heat equation provides the ideal framework for test- 
ing the procedures that will constitute the core of this project: the Helmholtz 
and Poisson solvers. We consider the unsteady heat equation (see Chap. 1 for 
the 1D equation) 


a = Au(t, x, y) m Tay), for (2y) z N = (0, Lz] x 0, Ly], (12.72) 


with periodic boundary conditions and initial condition u(0, x,y) = u°(x, y). 


This equation will be numerically integrated in time until a steady (equilib- 
rium) solution is reached. This solution satisfies 


=Au (ey) = f(x,y), for (x,y) € [0, Lz] x [0, Ly], (12.73) 


with the same periodic boundary conditions. The steady solution u,(x,y) may 
be interpreted as the limit for t > oo of the unsteady solution u(t, x,y). 


15 Although this script is intended to be straightforward, some comments may be 
helpful: the (diagonal) vectors aa,ab,ac are filled with random values; for each 
j, the matrix A of the system is reconstructed and transformed into a diagonal 
dominant matrix that is known to be invertible; the right-hand side of the system 
is computed as f = A x X, where X is arbitrarily fixed; every system is solved 
using the MATLAB syntax X = A\f; the function NSE_trid_per_c2D is validated 
if the returned solution is exactly X. This is a commonly used technique to test 
programs that solve linear systems. 
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We adopt in this section the following procedure to test the programs. We 
set the right-hand-side function 


2 
f(x,y) = (a? +b?) sin(ax) cos(by), where a= T b= T (12.74) 


which satisfies the periodicity along x and y. For this choice, the exact solution 
of (12.73) is 
Uex(X, y) = sin(ax) cos(by). (12.75) 


Indeed, it is obvious that f was chosen such that f(x,y) = —Auez. Using 
f(x,y) as input data in the programs, we get a numerical solution that has 
to fit the exact (analytical) solution. If this is not the case, debugging is 
necessary! 


Explicit Solver 


The simplest method to solve (12.72) is based on the explicit Euler scheme 
(see Chap. 1) 
Go Sa OR PE Au): (12.76) 


Assuming that the solution is computed at grid points (£e, ym) (see Fig. 12.1), 
the discrete form of the scheme becomes (i = 1,..., Ngm, j = 1,..-,Nym): 


u(ip(t), j) = 2u(i, 7) ay u(im(i), j) 
Ox? 


ult, jplj)) — Zuli, j) + uli, jm) 
dy? 


u” Tii j) = u" (4, 5)+ôt | f (4,9) + 


$ 2 


The time integration starts from the initial condition u? = 0 and stops when 
the convergence to a steady solution is reached. We impose the following 
numerical convergence criterion: 


6S lah a p10", (12.78) 

where the norm is defined as ||y||2 = (fo p? dx dy)’. 
The drawback of this scheme in computing steady solutions is that the 
time step value is limited by a stability (CFL) condition. For the 2D heat 
equation, the CFL condition is expressed as (see, for instance, Hirsch (1988) ): 


1 1 cfl 
— + — | = — HSL i 
ôt (= + ss) 5? CE (12.79) 


Exercise 12.2. Compute the solution of the unsteady heat equation (12.72) 
using the explicit scheme (12.77). Compare the obtained steady solution to the 
exact solution (12.75). Use the following input parameters: Lẹ = 1,L,, = 2, 
Ng = 21, ny = 51, cfl = 1. Hints: 
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e the grid parameters may be defined as global variables; 

e write modular programs with specialized functions that can be reused for 
subsequent applications; for example, write separate functions to compute f, 
Au”, to plot the solution, etc.; 

e use a while loop for the time advancement, which allows one to easily 
implement the convergence criterion; 

e avoid for loops and use vectorial programming, more compact and easier 
to compare with mathematical relations; for example, the following function 
computes the discrete values of the Laplacian Au” using directly the array 
u (of size Nem X Nym) and the vectors ip, jp, im, jm defined by (12.28) and 
(12.29): 


function hc=NSE_calc_lap(u) 

global dx dy 

global im ip jp jm ic jc 

WG es. 
(u(ip,jc)-2*u+u(im,jc))/(dx*xdx)+(u(ic,jp)-2*u+u(ic,jm))/(dy*xdy) ; 


e plot in the same figure the isocontours of the numerical steady solution and 
the exact solution (12.75). 
A solution of this exercise is proposed in Sect. 12.9 at page 277. 


Figure 12.4 displays a typical result for the isocontours of the steady so- 
lution. Note that the numerical and exact solutions are difficult to distin- 
guish in the plot (a more quantitative comparison can be made by computing 
lu — Uex||2). The slow convergence to the steady solution is also illustrated 
in the same figure. We conclude that the explicit solver is easy to implement 
but requires small time steps and, consequently, large computational times. 
This suggests that an implicit solver able to take larger time steps is more 
appropriate for this problem. 


Implicit Solver 


We use the combined Adams-Bashforth and Crank—Nicolson schemes de- 
scribed previously to discretize (12.72) 





yrtt n 


1 


== HEAR) (12.80) 


Mens 2 
ot 


2 
— AM aal 
Adams-Bashforth  Crank-Nicolson 


Si 
S a 


where in this case, the term H” = H"~! = f(x,y) does not depend on time. 
We finally get the Helmholtz equation 
ot n ‘ n+1 n 
I-35 ^â ôu = ôt (f + Au”), with du=u"™ — u”, (12.81) 


which is solved using the ADI method with the following steps: 
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Convergence of the explicit method 
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Fig. 12.4. Test of the explicit solver for the unsteady heat equation. Superposition 
of isocontours of the steady numerical and exact solutions (left) and convergence 
history (right) for e = lu"? — u” le. 


ôt 8 N — j sae 
(1 a) 5a) du = ôt (f + Au”) + periodicity along x, 


(12.82) 


ôt ©? — Ere 
I — — — | ôu = ĝu + periodicity along y. 
2 Oy? 





Using second-order centered finite differences to discretize second derivatives, 
we obtain two linear systems with tridiagonal, periodic matrices. These sys- 
tems are solved using the function NSE_trid_per_c2D written for the previous 
exercise. 


Remark 12.2. The semi-implicit solver defined in this section is uncondition- 
ally stable, allowing arbitrarily large time steps dt. Compared to the explicit 
solver, it requires much less computational time to reach the steady solution 
(more work per time step but very few time steps to converge). 


Exercise 12.3. Resume Exercise 12.2 and implement the implicit solver. The 
time step will be computed using (12.79) taking cfl = 100. Evaluate the neces- 
sary computing time to reach the steady solution and compare to the explicit 
solver. Hint: noticing that the coefficients of the matrices involved in the ADI 
steps are constant in time, optimize the function NSE_trid_per_c2D by: 

e storing the coefficients of the matrices in vectors and not in two-dimensional 
arrays; 

e computing all the quantities not depending on the right-hand side only once, 
before the while loop. 

A solution of this exercise is proposed in Sect. 12.9 at page 277. 








Exercise 12.4. Consider now the following nonlinear convection—diffusion 
equation: 


+ Au= f(x,y), for (z,y) € [0, Le] x [0, 2y], (12.83) 
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with periodic boundary conditions and initial condition u°(x, y) = 0. 


1. Choose the analytical form of the right-hand-side function f(x,y) such 
that (12.75) is the steady solution of (12.83). 

2. Use the implicit scheme (12.80) to solve this equation. Compare the results 
to the exact solution. Hint: use the previous program and modify only the 
function computing H (be careful, for this equation H varies in time, and 
consequently, H” 4 HN), 

3. The time step is considered as constant and given by (12.79). Find the 
stability limit (cflmax) for a given space discretization. 


A solution of this exercise is proposed in Sect. 12.9 at page 278. 


12.8.3 Solving the Steady Heat Equation Using FFTs 


We now write and test the necessary functions for solving the Poisson equation 
(Algorithm 12.4). We use as a test case the heat equation (12.73). 








Exercise 12.5. 1. Solve the steady heat equation (12.73) with the right- 
hand side (12.74) using the Poisson solver described in Sect. 12.5 (i.e., FFT 
along x and finite differences along y). Compare to the exact solution.!6 
Input parameters: Lg = 1, Ly = 2, ng = 65,n, = 129. 

2. Optimize (see Exercise 12.3) the function solving the tridiagonal system. 
3. Solve numerically the same equation using two FFTs. 
A solution of this exercise is proposed in Sect. 12.9 at page 278. 


12.8.4 Solving the 2D Navier—Stokes Equations 


We are now ready to assemble all the modules previously developed to solve 
the Navier-Stokes equations and simulate the flows described in Sect. 12.7. 


Exercise 12.6. Write a Navier-Stokes solver for two-dimensional periodic 
flows. 
Hints for the structure of the program: 


define the global variables, 

set the input parameters, 

build the 2D grid and related arrays, 

define the arrays to store the flow variables and initialize them to zero, 
set the initial condition corresponding to the simulated flow (see below the 
parameters for the suggested run cases), 

e visualize the initial field, 


16 As we have already seen, the numerical solution Unum is computed up to an 
additive constant. To compare to the exact solution Uer, we have to calculate 
this constant by imposing that the two solutions be identical at a chosen point 
(i = j = 1, for example). We then compare tex tO Unum + (Wer (1, 1) — Unum(1, 1)). 
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e compute the time step, 
e compute the variables for the optimization of the ADI method and Poisson 
solver (i.e., all variables or coefficients not depending on time), 
e start the time loop: 
— solve the momentum equation for u, 
— solve the momentum equation for v, 
— compute the divergence of the nonsolenoidal field, 
— solve the Poisson equation, 
— correct the velocity field, 
— compute the pressure, 
— update the pressure gradient for the next step, 
— solve the equation for the passive scalar, 
— check that the divergence of the velocity field is zero, 
— visualize the flow field by plotting the isocontours of vorticity and pas- 
sive scalar. 
e end of the time loop. 








A solution of this exercise is proposed in Sect. 12.9 at page 278. 


Run cases. The expected results are illustrated in the figures at the end 
of the chapter. 





1. 2D jet: Kelvin—Helmholtz instability; input parameters 
L, =2,L, =1,nz = 65, n, = 65, cl = 0.2, 
he = 1000; Pe = 1000,09.= 1. 2 = 20, B= h/t Ag = 05,47 = 0.5 Le. 
The initial field for the passive scalar is identical to the field of u. 
2. Same configuration, but changing cfl = 0.1, A, = 0.25L,. 
3. Vortex dipole: 
L, = 1, Ly =1,nz = 65,n, = 65, cfl = 0.4, Re = 1000, Pe = 1000; 
vortex 1: 
Op SOO = Dee Ay = By) 2 0.05; 
lé = 0.4V2 min Vous Yv, Ly — Ly, Ly B VU} ; 
vortex 2: 
Vo 00a Sb 4, Yo Sg 2 — 0.05, 
l, = 0.4V2 min AU die i bg Os 


The initial field for the passive scalar is set to a large stripe, placed in the 
middle of the computational domain. For example, take 


xli, j) = 1, if Nan / 2-10 < i < ngm/2 + 10, 
x(i, j) = 0, otherwise. 
4. Add to the previous configuration a second dipole propagating in the 
opposite direction. 


5. Imagine other flow configurations with several dipoles in the computa- 
tional domain. 


12.9 Solutions and Programs 277 


12.9 Solutions and Programs 


The MATLAB scripts for this project are organized in two directories: 


e NSE QP containing the solution scripts for all preliminary questions (Ex- 
ercises 12.1 to 12.5), 
e NSE- QNS containing the Navier-Stokes solver (Exercise 12.6). 


There is also a third directory named NSE INTERFACE in which a dif- 
ferent programming philosophy is illustrated. All the solution scripts of the 
project are called from a graphical user interface (GUI) and the results are 
displayed interactively. A supplementary Navier-Stokes run case is computed 
in this version. To launch the interface, just run the script Main from the 
subdirectory Tutorial. 


Solution of Exercise 12.1 (Solving a Tridiagonal, Periodic System) 


The MATLAB script NSE_trid_per_c2D.m contains the function that solves 
simultaneously m linear systems with tridiagonal, periodic matrices of size n. 
Numerous comments in the script are intended to guide the reader through 
the steps of Algorithm 12.6. Memory storage was optimized using a minimum 
number of arrays for computing intermediate coefficients. Note also the vec- 
torial programming of the algorithm. 

We recall that this function is called (and tested) by the script NSE_test_trid.m. 





Solution of Exercise 12.2 (Explicit Solver for the Unsteady Heat 
Equation) 





The script NSE_Qezxp_lap.m is straightforward to read and execute. Neverthe- 
less, it is useful to indicate the specialized functions called by this program: 





NSE_calc_lap: computes the Laplacian Au; 

NSE_fsource: computes the right-hand term (or source term) f; 
NSE_fexact: computes the exact solution; 

NSE_norm_L2: computes the norm ||u|/2; 

NSE_visu_isos: plots in the same figure the isocontours of the numerical 
and exact solutions. 


Solution of Exercise 12.3 (Implicit Solver for the Unsteady Heat 
Equation) 


The script NSE_Qimp_lap.m inherits the structure of the program imple- 
menting the explicit solver. Since the implicit method requires to solve tridi- 
agonal, periodic systems, two functions optimizing this part were added: 
NSE_ADI_init and NSE_ADI_step. The optimization starts from the obser- 
vation that in the general solver NSE_trid_per_c2D all the computations not 
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depending on the right-hand side fi can be done only once, outside the time 
loop. This is the role of the function NSE_ADI_init, returning the vectors ami, 
api, alph, xs2, which do not change during the time integration; these vec- 
tors are used in NSE ADI step (called inside the time loop) to compute the 
solution of the linear system for a given (time-dependent) vector fi. 





Solution of Exercise 12.4 (Implicit Solver for the Nonlinear 
Convection—Diffusion Equation) 


The script solution NSE_Qimp_lap_nonl.m for the nonlinear problem inherits 
the structure of the previous program (NSE_Qimp_lap.m). The only difference 
is the computation of the term H for every time step by calling inside the time 
loop the function NSE_calc hc (file NSE_calc_hc.m). It goes without saying 
that the expression of the source term f was modified in NS'E_fsource_nonl.m 
to take into account the new nonlinear term in the equation. 


Solution of Exercise 12.5 (Solving the Steady Heat Equation Using 
FFTs) 


Algorithm 12.4 is implemented in the script NSE_Qfft_lap.m. The script solv- 
ing a tridiagonal system is optimized by splitting the algorithm into two parts, 
corresponding to the functions NSE Phi_init and NSE_Phi_step. This is sim- 
ilar to the optimization of the ADI method, with the difference that this time 
the coefficients of the matrix change from one system to another (because of 
the dependence on the wave number), and consequently, they are stored in 
two-dimensional arrays. The script NSE_Q2fft_lap.m solves the same problem 
using two FFTs (along the x and y directions). 





Solution of Exercise 12.6 (Solving the 2D Navier—Stokes 
Equations) 





The main program NSE_QNS.m allows one to choose between the four 
suggested run cases. Comments in the program body identify each step 
of the numerical algorithm. The program calls the main functions written 
for the preliminary questions (NSE_ADI_init, NSE_ADI_step, NSE_Phi_init, 
NSE_Phi_step) and the following specific functions: 





e NSE_init_KH: initializes the flow field for the Kelvin-Helmholtz (2D jet) 
run cases; 

e NSE_init_vortex builds the flow field corresponding to an individual vor- 
tex; the vortex dipoles are obtained by superimposing the individual vortex 
fields; 

e NSE_visu_vort visualizes the vorticity field (color images of isocontours); 

e NSE_visu_sca visualizes the passive tracer leading to images similar to 
experimental ones; 
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e NSE- print div prints the divergence of the velocity field to verify whether 
the computation is stable. The divergence must be close to the machine 
zero value, which is 10~!° for double-precision computations; if this is not 
the case, run the same computation with smaller time steps. 


It is beyond the scope of this project, which focuses essentially on numerics, 
to get into a detailed physical description of the simulated flows. We discuss, 
however, some interesting physical features illustrated in the following figures. 

The evolution of the Kelvin-Helmholtz instability is shown in Figs. 12.5 
and 12.6. The Kelvin cat eyes vortices form progressively in the two shear re- 
gions of the jet; their spatial distribution is dictated by the wavelength (Ax) of 
the initial perturbation. At this point, one might question whether a periodic 
simulation could be realistic. Since in real jet flows the instability progres- 
sively grows downstream of the injection point, our periodic computational 
box may be regarded as a fixed frame that zooms in the shear layer region 
while traveling downstream with the mean velocity of the flow. Periodic sim- 
ulations offer useful information on the evolution of vortical structures in jet 
or shear-layer flows that fit very well to experimental results. The reader who 
wishes to pursue this study further could attempt to simulate the next stage of 
the Kelvin-Helmholtz instability, which consists in the pairing of neighboring 
vortices. 

The first vortex dipole run case is illustrated in Fig. 12.7. The dipole 
effectively propagates along the horizontal axis towards the right boundary; 
it could be interesting to continue the simulation and see how the periodicity 
makes the dipole reenter the computational box from the left. The velocity 
induced by the dipole triggers the movement of the passive scalar (initially 
at rest), with a nice mushroom pattern forming. This kind of structures has 
been reported in studies of flow dynamics in oceanography, meteorology, and 
combustion. 

The last run case (see Fig. 12.8) shows the head-on interaction between 
two dipoles of the same intensity. The result is the partner interchange with 
the formation of two new dipoles propagating in the perpendicular direction. 
The simulation may be performed for larger values of the final integration time 
to see a second collision (due to the periodicity, a dipole leaving the domain 
reenters through the opposite boundary). The reader may wonder whether 
this is a never-ending evolution! 

Other interesting run cases could be imagined and simulated with this 
Navier-Stokes solver. We refer to many existing fluid mechanics books as an 
obvious source of inspiration. 
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Fig. 12.5. Run case 1. Evolution of the Kelvin-Helmholtz instability for the per- 





turbation wavelength A,/Lz = 0.5. 
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Fig. 12.6. Run case 1. Evolution of the Kelvin-Helmholtz instability for the per- 
turbation wavelength A, /Le = 0.25. 





282 12 Fluid Dynamics: Solving the Two-Dimensional Navier-Stokes Equations 


Vorticity t=0.070822 Scalar t=0.070822 





Vorticity t=0.92068 Scalar t=0.92068 





Vorticity t=2.0538 Scalar t=2.0538 





Fig. 12.7. Run case 3. Evolution of a vortex dipole. 
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Fig. 12.8. Run case 4. Head-on collision of two identical vortex dipoles. 
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